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There are probably few statisticians who have had more friends scattered across the world 
than had John Wishart. Many of these friendships were made in the course of his work 
as a teacher of statistical method to practical agriculturalists overseas. It was in the middle 
ofteaching work of this kind, which perhaps he had come to enjoy more than any other and 
in which he undoubtedly excelled, that he died suddenly on 14 July 1956 while bathing at 
Acapulco on the Pacific coast of Mexico. A pioneering visit to Nanking University in 1934 
had been followed after the war by visits to Spain in 1947, to the United States in 1949, to 
India in 1954 and then, in this last year, to Mexico, where he was taking a leading part in 
the work of the Training Centre in Experimental Design arranged by the United Nations 
Food and Agricultural Organization. . 

John was the second of the four sons of John and Elizabeth Wishart; he was born in 
Montrose on 28 November 1898, the family moving to Perth in 1900. After schooling 
at Perth Academy he went to Edinburgh University in 1916, finally obtaining a lst Class 
Honours Degree in Mathematics and Natural Philosophy in 1922. His university career 
was broken by two years’ service in the Army (1917-19), when he was a subaltern in the 
7th Battalion of the Black Watch, and saw active service in France during the last few 
months of the war. His four years at the university included a teacher's training conrso at 
Moray House, which led him on to his first post as a mathematics master at the West 
Leeds High School (1922-4). Here he met Olive Birdsall, whom he married in 1924. There 
were two sons of the marriage. ‘ 

When on the look-out for mathematical assistants, Karl Pearson had on several occasions 
asked E. 'T. Whittaker for a suitable name among his past Edinburgh students, and it was 
with a recommendation of this kind that Wishart came to University College, London, in 
the autumn of 1924 to start on a statistical career. The post which he filled was that of 
Research Assistant to Pearson, the funds being supplied by the Department of Scientific 
and Industrial Research. f ' 

The computation of the Tables of the Incomplete Gamma Function had been finished, ei 
Tables being issued in 1922, and Pearson was already preparing the ground for his next 
great computational undertaking, that of the Tables of the Incomplete Beta Thina 
It followed that one of Wishart’s main tasks on arriving in London was to got this work on 
the Beta function under way. I cannot recall how far he got with the tabling Piae, 
which was not finally completed till 1932, but there Js no doubt that his Ti Ae n 
apprenticeship in statistics involved him in much hard computing gere es a a 
earliest papers, too, published in Biometrika (1,2,5") were concerned with mer nous 

d i i i th the Gamma and Beta distributions 
approximation to the incomplete Beta function. Bo 2 im. unt prx en ee o 
and their practical uses were a continuing source of interest to him, ia o ^ 

; : T y method of approximating to the 
papers published in 1956 (77) was concerned with a new 1 
integral of the former. 
given in order of date in the Bibliography on pp. 6-8 


* Here, and below, the numbers are those 
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I do not think that Wishart regretted his hard training in po u—— 
started, without machine aid, in Whittaker's mathematical icis » user 
was to play an active part on the British Association's Mathiematigal - x situa 
(1928-48), of which he was for some years the secretary, and Biometrika bab E i à : mi nd 
the help, advice and criticism which he gave in the recasting of the Biometrika S 
pen three years at University College Wishart attended the two main — 
lectures on mathematical statistics which Pearson was giving annually, and he learnec € 
to handle mathematical problems and put together a scientific paper, but the terms = a 
appointment as research assistant precluded any form of teaching. He was a —€— 
temperament and training, and it was, perhaps, the prospect of starting some et atis a 
classes at a new centre, rather than any marked advance in salary, which led him in i 
autumn of 1927 to accept a post as Mathematical Demonstrator at the Imperial College 9 
Science. — 

He had scarcely settled in his new post, however, when a second great opportunity Bad y 
his way and he found himself at the beginning of 1928 appointed as Statistical aT 
under R. A. Fisher at Rothamsted. Fisher's scientific output had at this juncture reache¢ 3 
remarkable tempo, and Wishart had the good fortune to see at close quarters and to " 
in the development of much new theory and in the trying out of that theory in practic 
experimentation. all 

His very considerable number of publications during the years 1928-32 illustrate W A 
Wishart’s fields of research. On the more mathematical side, with Fisher’s encourageme?” 
he broke new ground in 1928 with the derivation of the generalized product-mom® 
distribution (8) which was to play an important part in the development of multivar 
analysis. A number of his other papers followed from this beginning (e.g. (10, 13)). pe 
papers (9,11,23,29 were concerned with the properties of the distribution of the multiP 


correlation coefficient and allied distributions which Fisher had derived between 1924 a 
1928, using geometrical methods. Wishart took a hand, too, in Fisher’s work of shown 
how the impasse in the theory of sampling moments which seemed to have been reached 
the 1920’s, could be broken thr 


0 
ough by a systematic use of k-statistics and the methods 
combinatorial analysis (12,21, 


ip 
aa 24). I well remember how my first gleam of understand? 
on partitional problems came from Wish 


nol 
; art’s demonstration with match-stick patterns 
his sitting-room at Harpenden. 


; : vor 
At the same time Wishart was taking his full share in the Statistical Department's WO" 


of servicing the experimental programme at Rothamsted. Joint papers such as those we 
Clapham (15), Fisher (17) and Allan b 


a number of simple expository papers (e.g. (14, 20, 22,26). His power of putting mr 
clearly to his non-mathematical collea, 


gues was certainly appreciated. As Sir John RUS 
wrote shortly after Wishart’s death: 


8 
It seems only the other day that he first came to Rothamsted to help in the development of what : 
then a new and untried subject, of the value of which many were yet uncertain. He threw hi” op 
into the work with energy and enthusiasm, mastered its intricacies and having the gift of expo?) 0” 
was able to explain its difficulties to the various young people at Rothamsted who wanted to learn ps al! 
thing about it but feared it was beyond their grasp. But he succeeded and he helped many. - - 3 op?” 
regretted his departure to Cambridge, but of course we knew it was a bigger field giving him more B3 


wit 
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In 1931 Udny Yule decided to give up his post of University Lecturer in Statistics at 
Cambridge which he had held since 1912, and a committee of the University recommended 
some reorganization of the teaching work. A Reader* was to be appointed in the Faculty 
of Agriculture, who should also do some teaching in the Faculty of Mathematics, while 
Economic Statistics was to be the concern of a separate Lecturer. Wishart was appointed 
to take up the Readership in October 1931; there was no one more suited for the post. He 
had behind him 33 years experience of applying the new statistical methods in agricultural 
experimentation and of explaining their meaning to the non-mathematician; he had received 
an apprenticeship in mathematical statistics under both Karl Pearson and R. A. Fisher; 
he had a shrewd, objective judgement which enabled him to keep clear of controversy and 
to combine what he thought good from the older and newer schools of thought; and, finally, 
he was ready to take any amount of trouble in developing lecture and practical courses 
for new types of student audience. 

He was given accommodation in the School of Agriculture, where in addition to his own 
room he had a larger ‘laboratory’ containing some half dozen desks for post-graduate 
students. He began his teaching with a general course on statistical methods offered in the 
Faculty of Agriculture and a course on mathematical statistics which could be offered for 
Schedule B of the Mathematical Tripos. There was also an optional practical class associated 
with the second course. In the eight years which intervened before the war, the number of 
mathematical students who attended his Tripos course and in many cases stayed on for 
post-graduate work increased steadily. This drawing of Cambridge-trained mathematicians 
into the field of statistics had a profound effect on the rate of development of the subject 
in this country, and it was Wishart who played the essential part in achieving this result. 

Impressions of these days left on his early students are interesting. As is inevitably the 
case when statistics teaching can only be fitted in as one subject in a mathematical syllabus, 
the Schedule B course was rather overcrowded. It was in the more leisurely conditions of 
full-time post-graduate study that Wishart/s teaching was most effective. A characteristic 
aspect of the research laboratory has been described by W. G. Cochran: 


Tn those days he believed in his students keeping office hours. When he assigned me a desk in the 
Lab., he told me that he expected me to be sitting at the desk most of the day when not in class. He 
instrueted me to do three hours computing a day on a table of the 194 level of z to 7 decimal places, 
While Fisher in the meantime was computing the 5 96 table. Having anticipated a free and easy life 
as à graduate student, punctuated of course by periods of esoteric thinking when the spirit moved me, 
I didn't much like either the office hours or the computing, but I don’t think they did me any harm. 


In asking post-graduates to keep regular hours for statistical work and later on in 
expecting somewhat similar regularity from his lecturers, he was following, perhaps un- 
consciously, Pearson’s tradition established at University College. It was a routine less 
readily accepted at Cambridge, especially on the mathematical side. Another custom in 
which he followed K. P. was in making daily rounds of the laboratory to see how his students 
were getting on and to discuss any difficulties with them. 

H. O. Hartley was a student a year or two after Cochran, and writes: 
atistics was two-sided. He had an amazing gift to 
1f, to learn something about applications and to 


logist. He was also successful, although less so, 
He would walk into the laboratory 


_ John Wishart’s strength as a teacher of applied sti 
Inspire a very theoretical mathematician, like myse 
bring him into contact with the agriculturalist and bio .H 
In converting the latter to see the need of mathematical statistics. 


* Yule was made Reader for tho last few months of his appointment. 
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and look over your shoulder as you were struggling with an analysis of variance, point toa fue a per 
‘this one looks wrong’, and would usually be infallible in this. He had a computational und p vri o 
for approximations and took a great interest in computing methods, tables ann THETIC: Y peer 
was an ideal teacher of statistical analysis, but not quite as convincing as a. teacher ofstatistien ii peers 
He had a good humoured gift to learn from failure and tell the stories against himself for others t« 

from. l l ad 

A feature of this Cambridge development in the 1930's was the location of the stat a 
laboratory in the School of Agriculture, with agricultural and mathematical vam 
working more or less side by side. "This was, if you like’, M. S. Bartlett points out, ‘a reflexi : 
of Wishart’s dual qualifications; in those early days at least it was a very happy combinatio 
which set the tone for a balanced outlook on the theory of statistics, which was always t° 
be found in his department.’ 

It has been remarked that Wishart did not produce much original research in these p?& 
war years, and it is true that if we look at his list of publications, the papers written baie 
1933 and 1939 were almost entirely of an applied or expository character. But origin? 
mathematical research is not the only mark of a good statistician. We should remember 
that until Bartlett was appointed in 1938 as a mathematical lecturer to help in the statisti’ 
work, Wishart was teaching single-handed. At the same time he was acting as consulta" 
in statistical matters in connexion with the experimental work of various ( ‘ambridg? 
research institutes. It was a time too when much hard spadework in the way of relativel” 
simple exposition was essential to convince the experimentalists at Cambridge and else 
where of the value of the new statistical tools. This was a type of work that he found mof 


congenial and he threw himself into it to the full. 


: x . - tontoty wie 
The Industrial and Agricultural Research Section of the Royal Statistical Society W 


founded in 1933 and Wishart was one of the six Fellows of 
original organizing committee, He 
in Agricultural Research’ (35), 
Agricultural Statistics, 1931 
continued to play an active 

If the particular organiza 
it did contain the seeds of l 


the Society who formed E 
gave the second paper to the section in 1934 on ‘Statist 
and later in the same year provided a long ‘Bibliography m 
—33' (36) for volume 1 of the I.A.R.S. Supplement. Indeed. 
part in the section's first six years of very vigorous life. zh 
tion of statistics at Cambridge worked admirably to start Wl 
ater trouble. As the agriculturalis 


à rep” 
iculture became less appa?" , 


in outlook as any in this resp 
ere no doubt faults on all side? i 
d, his position was not altogether an easy one. ^ " 
; when war came he was glad to be able to throw himself into another!” i 
ate, would free him from more academic batt i 


: Mie 
From May 1940 to February 1942 he was in the Army asa Captain employed on Inte ty 
gence work, and then, until 1946, he was occupied with 


statistical work for the Admit i 
it Secretary in the Establish in 
he Secretary's Department. V 


in London and Bath, working with the rank of Assistai 


the Production and the Naval Personnel Divisions of t 
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the war ended he hesitated for some time whether to return to Cambridge, but no doubt 
the great opportunity of developing the work he had started there drew him back to 
university teaching. 

On the mathematical side Wishart was now able to make great progress. Additional 
members of the Mathematics Faculty were appointed who were to be primarily concerned 
with statistical teaching, so that by the early 1950’s the Reader could call on the help of 
three lecturers and an assistant lecturer. Part IT of the revised Mathematical Tripos con- 
tained a course on Random Variables which aimed at preparing the way for fuller statistical 
work in Part ITI.* Additional post-graduate courses were offered and, finally, to give more 
adequate training on the applied side, there was a Diploma course which required the post- 
graduate student to spend a considerable amount of his time on the application of theory 
in a selected field. The Diploma courses at Oxford, Manchester and Aberdeen have followed 
the Cambridge experiment on rather similar lines. 

Approval was given for a Statistical Laboratory to be set up within the Mathematics 
Faculty, and after some delay due to building difficulties this was housed in a temporary 
building in St Andrew's Hill. When it was opened in the summer of 1949, Wishart could 
look back with some satisfaction at the suecess which he had had in grafting the teaching of 
statistics on to the Cambridge mathematical system. Yet recognition of his own position 
still came rather grudgingly, for it was not until 1953 that his appointment as Director of 
the Statistical Laboratory was confirmed. It began to be realized, too, as had been found 
elsewhere, that a statistical unit could be of better service to a university if it were free from 
the control of mathematics. 

In his post-war published work the emphasis was again on exposition, but this was now 
mainly in the mathematical rather than the agricultural field. Thus between 1947 and 1951 
at a time when the Institute of Actuaries was introducing additional statistical matter into 
its revised examination syllabus, he wrote four useful papers (54, 55, 59, 65) on different 
aspects of statistical theory for the Journal of the Institute's Students' Society. New con- 
tributions totheory include three Biometrika papers (53, 62, 67) dealing with hisold favourite, 
the development of moment and cumulant theory. In the applied field, perhaps the most 
important contribution was a largely rewritten Part I to W ishart & Sanders’s Principles 
and Practice of Field Experimentation (75). This admirable little book, translated into 
Spanish, formed the basis of his 1956 F.A.O. course in Mexico. 

If his output of new scientific work was small, Wishart's last ten years were very active 
ones. In addition to the work of building up the Cambridge Laboratory, he was made the 
first Chairman of the Royal Statistical Society’s Research Section, when it branched off 
from the Industrial and Agricultural Research Section in 1945. He also took a leading part 
in the Society’s two post-war committees on the Teaching of Statistics. Made an Assistant 
Editor of Biometrika in 1937 and Associate Editor in 1948 he played a loyal and diligent 
part in maintaining the standard and tradition of this Journal. 

With able colleagues available in Cambridge it was possible for him to make use of the 
University’s newly established sabbatical year and to find other opportunities for gomg 
overseas. Thus lio was in the United States for nine months in 1949, lecturing at the Uni- 
versities of North Carolina and California. In the autumn of 1954 he lectured for three 


55 is given as an Appendix to Wishart's con- 


* A pi ses ilable in 19 : i 
full description of the courses avai cs held by the Royal Statistical Society in 


tribution (72) to the discussion on the Teaching of Statisti 
February 1955. 
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months at the F.A.O.'s Training Centre in Experimental Design and the Survey €— 
of Experimentation held at New Delhi. Finally, after going to Mexico in January ^ s a 
& technical assistance expert to the Mexican Government, he was responsible. x a 
day-to-day running of a similar F.A.O. course in Mexico City from April until his de 
D COR of warm tributes have been paid to the value of his work on these pei 
courses. There were many elements in his success: the carefully planned lectures, his eu 
of getting them across and his skill in getting back to reality after short excursions into P. 
abstract. As a colleague of his on the Indian course remarks: ‘Few of his sidante y a 
forget his injunction not to substitute covariance for weed control and other -— i 
good husbandry.' But apart from work of a more formal character, he had, and showed th F 
he had, a real concern with the particular difficulties and problems of each student an 
colleague. His interest in students from other lands had started in his Rothamsted ee 
and continued at Cambridge, both in the hospitality which he and his wife showed to foreig 
visitors in their home and through the activities of the Cambridge All Peoples Association: 
In both India and Mexico he was eager to learn about the art and architecture, the history 
and customs, of the people around him, and he learned enough Spanish to make his E 
augural speech at the Central American Training Centre in this languago. His friondline® 
and obvious enjoyment in all that was happening made a notable contribution to th 
success of these gatherings. ». 
If we look back in retrospect we shall realize, T think, that John Wishart’s contributio 
to our subject has lain not so much in any outst: 
the development of new concepts, but in the m. 
he has played in the dissemination of st 
methods during the last 30 years. Whetl 
variance in the field, in convincing the a, 
design were not too hard after all to unde 
advantageofan expanding methodology, 
at Cambridge or in extending the activi 
doing his share of work to the full. 


anding piece of mathematical research: A 
any-sided and very necessary part whie 

atistical ideas and the application of statistic 

her it was in early trying out of the analysis 9 
griculturalist that the principles of experiment? 
rstand, in exploiting new techniques to the gene 
inbuildingup a teaching and research organizatio. 
ties of the Royal Statistical Society, he was ther^ 
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RESTRICTED SEQUENTIAL PROCEDURES 


By P. ARMITAGE 
Statistical Research Unit of the Medical Research Council, 
London School of Hygiene and Tropical Medicine 


1. INTRODUCTION 


When sequential methods are advocated for industrial inspection work, the argument is 
usually advanced on grounds of economy. A given degree of discrimination between ‘good’ 
and ‘bad’ quality is achieved, by a sequential procedure, with a smaller average number of 
observations than that required by an equivalent non-sequential procedure. This may well 
be a relevant criterion when the sampling inspection is to be repeated frequently as a routine 
operation. In many fields of experimentation this argument carries little weight, because 
the uncertainty in knowing how long a particular sequential experiment will continue may 
outweigh any possible long-term economy. There is, however, a rather different reason 
why sequential procedures of some sort may be suitable for many trials in clinical and 
preventive medicine (which are essentially classical randomization experiments to compare 
different medical treatments, with human subjects as experimental units). These trials are 
usually sequential in the trivial sense that the subjects enter the experiment at different 
points of time, but this fact would not necessarily suggest sequential methods of design and 
analysis. The important point is that a doctor responsible for patients in a trial will frequently 
regard it as unethical to continue the trial if he is convinced that a difference between the 
effects of two treatments has appeared, because this would mean withholding from a patient 
under his care a treatment which he regarded as better than that being given. This con- 
sideration will be most cogent when life and death are involved; it may not apply at all if 
the illnesses being treated are fairly trivial, but it will usually be present to some extent. 
There is, therefore, a natural tendency for the organizers of a clinical or preventive trial to 
examine the observations at various points of time, and to stop or modify the trial if a 
difference between treatment effects is apparent. Sequential experimentation of this sort, 
however reasonable, will to some extent affect the validity of any probability statements 
involving integration over sample space (Anscombe, 1954), and it is natural to investigate 
the possibility of carrying out the trial according to some well-defined sequential procedure 
for which valid probability statements can be made. : 

In an earlier paper (Armitage, 1954) I suggested that comparisons of two treatments 
could be made by fairly straightforward modifications of Wald’s methods. Means could be 
compared by two-sided sequential t-tests (National Bureau of Standards, 1951), and pro- 
portions could be compared by a two-sided analogue of Wald’s one-sided test for comparative 
trials. (The test for proportions has been proposed also by de Boer (1953).) These procedures 
retain the feature, common to all probability -ratio sequential tests, that the number of 
observations required before a boundary is reached is unlimited. This property is in practice 
unattractive, and although a Wald sequential procedure can be truncated the effects of 
truncation are not generally known. To preserve, evon approximately, the nominal ‘risks 
of error’ the truncation must be applied after a fairly large number of observations, and 
the distributions of the sample number still have very high variability (for examples of 
Such distributions, see Baker, 1950). Such variability discourages the use of this type of 


10 Restricted sequential procedures 


procedure for medical trials, since it is clearly inconvenient to plan a trial the ce 
of which has an eapectation of between, say, 2 and 9 months (depending on the ac - 
difference between the effects of the treatments), but which may in an individual instance 
have to be continued for as long as 3 years. "n 
The type of procedure which seems to be required is one which incorporates quee i 
as an integral part of the scheme, which allows much less variability of sample num " 
than do Wald schemes, and for which some at least of the probabilistic properties are known. 
Bross (1952) constructed some closed sequential designs for the comparison of two pro 
portions, but although they satisfy the criteria stated above to be desirable, each design 
had to be constructed specially, to some extent by trial and error, and only two have so far 
been published. In the present paper I consider a class of closed sequential procedures: 
which I have called ‘restricted sequential procedures’. These seem to satisfy the above 
requirements, but their probabilistic properties depend on a diffusion approximation, the 
adequacy of which has not been fully investigated. The main feature of the method is that 
the sample number cannot be greater than some predetermined value, N. As sampling 
proceeds, a sample path may be drawn ona diagram, and sampling will stop after less than 
N observations if one of two ‘outer boundaries’ is crossed. These outer boundaries ave 


determined by the requirement that the likelihood ratio of either of two alte 


rnative hypo” 
theses, to the null hypothesis, takes some 


preassigned high value. It may be arranged that 
the probability, on the null hypothesis, that the sample path will stop on an outer boundary 
takes a predetermined small value, 22. If the procedure 
of the null hypothesis, 
to results significant at 
observations will corres 
from the null hypothes 
brought to a close—a 

These procedures were developed with medical applications in mind, but they may be 
of some use in indus 


is used to provide a two-sided test 
sample paths reaching the outer boundaries will then correspond 
the 2a level; those failing to reach the outer boundaries before 

pond to non-significant results. The greater the apparent departur? 
is, the sooner will one of the outer boundaries be hit, and sampling 
feature which seems to accord with me 


s 1 
: it may be used for the estimation of an unknow” 

parameter; or it may be used in more thar 
For an account of a clinical trial conduct 


2. NORMAL DISTRIBUTION wrra KNOWN VARIANCE 
2-1. Suppose that observations w(i = 1,2,...) are drawn at random from a norm®l 


distribution with unknown mean y and known variance 0?. Let y 
n 


n ; ped 
ES P jj cte 
= Ya ‘Restr 
sequential procedures’ will consist in sa; 


p : =1 je? 
mpling until one of the three following boundari 
is reached: 
(i) the *upper boundary’, U: 3 n =a+bn (a>0); 
(ii) the ‘lower boundary’, L: Yn — —a—bn (a> 0); 


(ii) the ‘middle boundary’, M: n = N 
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U and L will be called ‘outer boundaries’. In order that neither outer boundary crosses 
Yn = 0 for n € N we shall require a+bN > 0. 

Suppose first that only U and M are present. If the discrete steps in are replaced by 
continuous movement in time, where the time unit corresponds to a single observation, 
the random walk may be approximated by the one-dimensional diffusion process with drift 
Jt — b per unit time, growth in variance at a rate 9?, and an absorbing barrier at a. Hence, the 
probability of crossing U with not more than N observations is approximated to by the 
probability of absorption before time IN, which is given (Bartlett, 1946) by 

=p 7 2 E —a-—(u—b)N 
Rig) = 1- F (= USB oxy rto) p [7e p m, () 
where F(u) -f (27)-5 e-*^ dt. This formula is valid for a > 0, as is appropriate here. The 


same approach to this problem is suggested by Page (1957). 

If L is now introduced, the probability of absorption on U is affected. The expression (1) 
should be reduced by the probability that the path of the diffusing particle crosses first 
L and then U before n = N. This effect appears to be less important with the convergent 
or divergent boundaries considered here than it is with the Wald type of parallel-line 
boundaries. 

For divergent boundaries (when b > 0), the probability that a path, starting at a point 
on L, will cross U before n = N, is less than it would be for a path starting at the point 
(n = 0, y, = —a). For such a path the probability of crossing the line y, = bn before n = N 
is equal to P(x, N), and the conditional probability of then crossing U before n = N is 
less than P,(, N). The total probability is therefore less than {P,(u,.V)}*. But the prob- 
ability that a path starting at the origin will cross L before n = N is less than P,(—, N). 
Hence, an upper bound to the probability that a path crosses L and then U before n — N is 
Dy — x, N) {P (u, NY, and an upper bound to the proportionate error (i.e. the ratio of the 
error to the nominal value P,(#, N)) is A( —p, N) Pu, N). For particular choices of a, b 
and N this upper bound may not be very small, but in the applications discussed below 
it will be shown to be small. 

No analogous upper bound has been obtained for e 
the following discussion we therefore assume that b>0. 
res three arbitrary constants: a, b and N. 


onvergent boundaries, and in most of 


2-2. The specification of the boundaries requi ‘ à i 
In general, therefore, a set of boundaries will not be determined uniquely by the require- 
ment that the procedure should satisfy only one or two independent conditions. Consider 
first the problem of determining values of a, b and N satisfying the following two conditions: 

(a) On the hypothesis H, that x = 0, the probabilities that sampling will end on each of 
the outer boundaries are «<4 (these probabilities clearly being equal by symmetry). If 
the procedure is used to provide a significance test of Hy, a sample path ending on one of the 
outer boundaries may then be regarded as significant at a level 2a. i i 

(b) On the hypothesis H, that x = 44> 0, the probability of ea U isl =p (>a). 
(By symmetry, 1 — f will also be the probability of reaching L on the hypothesis H_ that 
=f) 

Since only 
expect them to determine a family of restricted 
Solution below provides one design out of this family. 


two restrictions have been imposed on the three constants a, b and N, we should 
sequential designs. The approximate 
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(2) 


Choose a= 6 In " b= A. 
Then, on U, the likelihood ratio of H, to H, is (1— //)/z. Whether L is present or not. the 
probabilities, on H, and Hp, of reaching any specified set of points on U will be in the ratio 
1— f to æ (neglecting the overshooting of the boundaries). In particular, if Piu, N) is the 
probability of absorption on U in less than N observations. no other boundary having 
previously been reached, and we choose N such that P(n, N) = 1— fl, it will follow that 
(again neglecting the overshooting of the boundaries) P(0, N) = æ. Conditions (a) and (b) 
will then be satisfied. Approximating to P(x, N) by Pus N), given by (1), we require 


f= g2 {A-Aa ^ =) - ( z2) r( _m{0 — Alay ^ e (3) 


AVN 2 a AYN 2 
where A = 44/0. Since a, 2 and A are known. (3) may be solved for N, by successive approxi- 


mation. In fact, for given « and £, (3) may be solved for A J/N; the solutions, A JAN. for 
various combinations of æ and // are shown in Table 1. 


Table 1. Normal distribution with known variance: values of A J N,, 


for various values of x and p 


RI- Peces : 
b" 0-050 0-025 0-005 
| 


E = 
0-10 3-41 3°72 2 
0-05 3-92 4-22 80 
0-01 4-95 5:33 5779 


The use of the single-boundary diffusion approximation (1) is justified since the uppe?” 
bound to the proportionate error in x (and hence in 1— /) is {R (0, N)} = a2, In practice 
a will usually be chosen to be fairly small (say 0-05 or less), since in the practical application® 
envisaged at the moment z may be regarded as the (one-sided) significance level at which 
one wishes to use less than the maximum sample number, N. The other approximation 
involved—that of ignoring the overshooting of the boundaries—is of thes 


involved in the formulae customarily used in Wald's sequenti 
can be obtained by diffusion theory ( 


jecture that the accuracy of the app 
to that of Wald's approximations. 


One other member of the family of restricted sequential procedures satisfying (a) and @) 


is the degenerate case obtai i = = , 
where i ained by putting a = oo, b = =%, N = Nanda +bN = u,a JM 


P(u,) = 1—a, F(uy) = 1— 5, 
AYN, = u, + uy. 


This is the fixed-sample-size procedure satisfying (a) and (b). Since the procedure with 
N = N, and the fixed -sample-size procedure with N — Ny both satisfy (a) and (b), and sinc? 


the former permits sampling to stop for n < N,. we should expect to find N, > N,. The ratio? 
AN; N, are given in Table 2 for various combinations of x and P. = 


ame type as that 
al tests. Wald’s formulae als? 
Bartlett, 1946), and it would seem a reasonable co” 
roximation in the present Schemes would be simil" 


and 
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Other members of the family of procedures satisfying (a) and (b) could be found by trial 


and error, but no attempt to do this has yet been made. 


Table 2. Normal distribution with known variance: values of Ni) No 
for various values of æ and fj 


B * 

0-10 1:355 1:313 1-252 
0-05 1-417 1:368 1-295 | 
0-01 1:553 1490 | 1 | 


2-3. It might in practice be more convenient to specify in advance the maximum sample 
number X, rather than the probability 7. We therefore consider the problem of determining 
values of a and b satisfying condition (a) as before, given that the maximum sample number 
is N. Again, we should expect a family of procedures satisfying these conditions. 

Since « and N are known, equation (3) may be solved for A by successive approximation 


for any chosen value of #. Denote the solution by A(/). Each pair of values of (f, A(f)) 
determines the value of a and b. A family of procedures may therefore be generated by 
hall here relax the condition required in § 2-2, 


allowing / to assume different values. We sl 
that A < 1 — æ. This is a natural requirement to impose if the boundaries are to be determined 
ve to consider the wider class of boundaries 


by considerations of power, but it is instructi EE 
obtained by allowing £ to vary between 0 and 1. In § 2-4 we prove that A(f) is a single-valued 
monotonic decreasing function of 2, ranging from +00 to —co as // ranges from 0 to 1. For 
B Zl—a,(f)z0. Hence, for all 2, the solutions satisfy 


isible, since (1—/)/a is the likelihood ratio, at 


points on U, of the hypothesis that y = A(f) 7 to the hypothesis that “= 0: Ibis therefore 
reasonable that (1—/)/a>1 if A(f) » 0 and (1 — ya « 1 if AU s 9.) It is also proved in 
$2-4 that a+bN >u,>0; the requirement of $21 that a+bN >0 is thus satisfied. For 
f»1—a, b(— cA(f)/2)<0 and the boundaries are convergent. Since, for convergent 
boundaries, no upper bound is available for the proportionate error involved in using the 
single-boundary diffusion theory, the procedures obtained for values of B» 1—a are at 
present of less practical interest than those for f « 1—2. Table 3 gives values of ajo, bļe 
and A(f) for various values of /, in the particular case where æ = 0-025, N = 50. These 
values were obtained by successive approximation. The corresponding upper boundaries 


are illustrated in Fig. 1. 
The limiting forms as #— 0 and A(p) 


as required. (This result is intuitively pla 


—co are of little practical interest, because when the 


slope of the outer boundary is very steep the total probability of hitting the upper boundary, 
æ, is largely concentrated at very low values of n, and in these circumstances the diffusion 
process (which allows the boundary to be crossed for any > 0) is a poor approximation to 


à process involving only integral n. 
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The horizontal boundary obtained for 7 = 1—z is of some interest, as procedures with 


one horizontal boundary have been discussed by Rao (1950). The value ofa is indeterminate. 


but the limiting value, ay, is shown in § 2-4 to satisfy the equation 
Flade JN) = 1—3a. (4) 


Table 3. Normal distribution with known variance: values of alo, bla and A(f) for restricted 
sequential procedures with « = 0-025, N = 50, and various values of f (cf. Fig. L) 


B | alo | blo A(f) | 
0-001 4-02 0-458 0-916 
0-010 4-97 0-370 0-740 


0 10 20 30- 40 50 
Number of Observations, n 


y the par. ich ar? 
shown in the diagram. For f= 1- Pérameter f, the values of whic 
sample-size procedure. 


In the notation of §2-2, a, = "c AN. The boundary therefore meets n = N at à point 
corresponding to a fixed-sample-size significance level of io, ona one-sided im The upp?” 
bound to the proportionate error, derived for divergent boundaries, is readily seen to b? 
valid also for horizont 


al boundaries. Moreover, an exact expression is available for th? 


i a 
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distribution of absorption time in the diffusion problem with two parallel boundaries 
(e.g. Feller, 1950, p. 304, Problem 7). In this particular case the upper bound for the 
proportionate error appears to be unnecessarily wide. For instance, if the boundaries are 
set at y,, = + ay, where ay is given by (4) with x = 0-025, the probability, on A), of absorption 
before n = N falls short of 0-05 by less than 10-12, 

Consider now the effect of replacing the diffusion process by discrete sampling, in the 
situation considered by Rao. Then, if U is horizontal and L is omitted (as in Rao’s pro- 
cedure), a, (defined by (4)) provides a rather close upper bound to the exact value 4, 
required to satisfy given values of x and N. For, if the equation of U is y, = ay, any sample 
path which, if continued to n = N satisfies Yx >A, must cross U at some n <N. Consider 
a path which first crosses U at n = 2) <N. On hypothesis 1), the probability that such a 
path, when continued to n = N, will satisfy yy >a, is slightly greater than 4, for all 7. 
(It would equal 4 if the paths did not overshoot the boundary.) If the probability, on Hp, 
of first crossing U at n = ny is a(n), then the probability that yy > a, is slightly greater than 
1 N-1 
= X a(n) +a(N), 


2 
“n=l 


N 
which is slightly greater than TS a(n). 
Z1 


But, by the definition of a, the probability that yy » is 32. Hence 
N 


a> YX aln), 
1 


and, in order to replace the inequality by an equality, a, must be reduced slightly. Hence 
4, provides a fairly close upper limit to Ay, rather better than that given by Rao (1950), 
as may be seen from Table 4. 


Table 4. Normal distribution with known variance: procedure with single boundary at y, = Ag; 
restricted at n = N. Upper bounds for Aca 4N, for various values of a. 


Upper bound given by 
a 
Uza = U/C JN Rao, p. 365 
0-050 1-9600 2-0626 
0025 ` 2-2414 2-3376 
0:010 2-5758 2-6655 


: "T loe 

The other limiting form, obtained as A()-^ — 9o and f 1, is: ge Mme da ies "- cn 
size procedure, with the upper boundary consisting of those points signi cane By ipso 
ability level æ, on a one-sided test. This is proved in § 2-4. It may be noted that m. this d 
the proportionate error introduced by using the one-boundary diffusion approximation is 
zero, even though 5 « 0. 


The choice of a restricted sequential procedure, pe à 
embarrassingly wide, and seems to depend on how ‘sequential’ we wish the procedure to 


for given values of c and N, is almost 
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be—to what extent, that is, we wish the sample number to be reduced below "bs — 
of N, as the mean, %, increases. At one extreme we have the fixed-sample-size pi m i i "dn 
which the sample number is always N. At the other extreme we have procedures, iE 
low values of # and high values of A(J). which terminate very much earlier for fairly ; es 
values of z. It seems likely (but has not been proved) that, for given x and N. the procec 3 i 
based on the higher values of £ (i.e. the less “sequential” ones) are the more Laiiar aed : 
the sense that for a given value of x> 0 the probability of crossing the upper — s 
greater. In the absence of any exact formulation of risk functions, it would seem Pact 
(judging from Fig. 1) to use procedures based on values of // between 0-001 and 0-50. - 

Ifthe given value of æ is one of those used in Table 1 (0-005, 0-025 or 0-05), the Lage e 
of three procedures (based on f = 0-01, 0-05 and 0- 10) are easily found, for division of the 
tabulated value by JN gives A(/), and, from (2), 


c 1-5 _ oA(f) 
a= Aq x =, 


2-4. We now prove the results stated in § 2-3. We have first 


THEOREM 1. For given values of «and N, the solution A(/) of (3) 


of P, decreasing monotonically from +o% 
Consider the function 


(In (1 -f)e) AJN E E {(1 = la) AJN 
g= r|" AJN ~ =| a)? AJN ` J- 
(3) is satisfied if and only if ġ = 0. 
From (5), after some reduction 


; : ion 
asa single-valued functio? 
lo ~% as increases from 0 to |. 


(5) 


1-4 1-4 
o "E NERIOEÀ saipl 
9A (na | Af 9P— 5 | ty] 


= O according as pzl-g. 


As a function of A, ¢ has a discontinuity at A = 0. Cc 


gas A— too and + 0, and of the sign of 99[0^, shows 
forms, according as f< 1—g or B»1—a. 

For f «1—a, $ decreases from (1 — 4) (1 — 1/a) to (l= 
=% to —0, and from 1—fto-fg 


; ; f 
onsideration of the limiting values i 
that (A) assumes one of two distin? 


A)(1—1]a)~Lasa increases fro? 
as A increases from +0 to +o. 
For /»1—, $ increases from (1— 4) (1 — 1/a) to 1 —f 
and from (1 —f)(1—1/a)—11to — 

The solution A(£) of (3) 


. B." 
as A increases from —co to 


/ as A increases from +0 to +0. 
must therefore be a single-valued function of P, satisfying 
A(f)z0 as PSl-a. 


To prove the monotonicity of A(), consider 06/02. We find, after some reduction, 


1-5? 1 
ln-—— 7 | 1 =£ 
wif 8 i 7 a ) A2N Se ka (6) 
08 — A\na(1— fy Nj €*P—5 AN eee tA ANE 2]. 
Inl ANI 
For A20, f «1—z, Were SANT 
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Hence, using the inequality on Mills’s ratio given by Gordon (1941), 


F(u)« —u-t(22)-1e-h* 


i i! (m E sex} | 


1 2a I | " 
i aes mh exp-5 73v n 


Hence 09/0/ < — 1, and from previous remarks about the behaviour of ¢ it immediately 
follows that for f « 1—« and A()) » 0, A(J) is a monotonically decreasing function of f, 
the limit of which, as B — 0, is +. 

For A «0, /»1—a, 09[0f cannot have the same sign for all A, since the upper and lower 
limits, 1—2 and (1— £) (1— l/æ), converge as /— 1. But, for 9 « 0, by (5), 


In lsg 
1 ^a AN} 1 p 1 
al m 3 ls pte) Tf (7) 
| 
In ANE 
where — 9A. >0, 
AND 2 


and, by Gordon's inequality, 
F(v) = 1- F(-v) > 1 -v(2)be-h* 


vw 
1 (=£ =Ø) exp- i | (in - ) E 


AN zaN 


>1+ 


Hence, the left-hand side of (7) 


iN 
1 2 i 1 (» a} , At 
>1+ ( J ew al ANN g^ 


A Vra(1— ff) N. 
and, from (6), oo[0f > 0. 
Hence, and from previous remarks about the behaviour of ¢, it follows that for »1—« 


and A(f) <0, A(f) is a monotonically decreasing function of 2, approaching the limiting 
value of —co as 1. 

This completes Theorem 1. We have not strictly proved, as part of the theorem, that 
A(f)—0 as —1—a. It is easily seen, writing 1—2 = a(14-8) in (3) and letting à— 0, 
that a solution of (3) is possible only if A(/) = O(9). 

To Theorem 1 we have the following 


-— Nod 
CoRoLLARY. Given N, any restricted sequential procedure mind E ne ae 
having a specified risk according to the diffusion approximation) is one of the family generate 


by the variable f. 
For, the slope 6 of the boundaries determines a value of A(f), and hence of 2, and the 


Corresponding boundaries must coincide with those given, since a must be a single-valued 
funetion of Py(0, N). 


2 
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The remaining results stated in § 2-3 are included in v 
THEOREM 2. The ordinate, a 4- bN , at the point of intersection of U and M, has a lower limit 
of w,c |N, approached as f| —1. As fl 1—2,a4 bN> wo JN. 
l 1-5 
; ln—- " 
a+bN _ a A(p) VN 


From (2), TJN ~ AQYN " s 
and, from (3), 1 mies) ej (Sy 
1-5 
a - Pa A(ff) z a | 
=Tal WENN * A 
«a. 


Hence (a+6N)/o JN has a lower bound of ua, and tends to u, as 


md 
Arenas). 
AUN 3 ]7^ 
i.e. as f 1, and A(f) - — co. 


The second part of the theorem follows from the result indic. 
that as f—1—5, A(f) = O(6) 
and (3) becomes 


ated at the end of Theorem v 
; where 1—2 = a(1 +ô). Then (a +bN)/o JN — d/A(h) Jr 


1—-a POJA) AN) — (14-8) F(— SJAA) JN) 
^ 2F(8|A(B) N)—1. 
Hence F(d/A(f) VN) ^ 1 — 3a, and the result follows. 


3. COMPARATIVE BINOMIAL TRIALS 


n 
, and so on. Follow j 
SS or FF, and ‘un’ P 


= (1—7,)7, 
(1-7). +m (1—7) fi 
62 $ accordingas 7, = 7,,andtheproblem of testing the difference between the two bino. 


parameters z, and 7, reduces to that of testing the departure of the single binomial param? 
0 from the value 1. 


We follow here the same basie approach to the problem of 
parameters 7, and zr,, and consider restrict 


tion of untied pairs, of which proportions 


3 j 
j comparing two bino a 
ed sequential procedures forsampling the poP ly. 
0 and 1— 0 are of types FS and SF respectiV? 
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3:2. Suppose that, of the first n untied pairs, n, are of type FS and n, of type SF. Let 
Yn = 2n, — = n, — ne. By analogy with the problem treated in $2-2, we shall first consider 
the specification of upper and lower boundaries U and L, and a middle boundary M with 
equation n = N, such that 

(a) on the hypothesis H, that 0 = 3, the probabilities that sampling will end on each of 
the outer boundaries are æ < 3; 

(b) on the hypothesis H,[H_] that 0 = 0,» [0 = 1—0, <4], the probability of reaching 
U[L] is 1—2 (>a). The logarithm of the likelihood ratio of H, to Hp is 


l= dy, log (6,/(1 —0,)) — n log {307 4(1 — 0,)78. 


As in $2-2 we arrange that, on U , l = log {(1—f)/a}. This gives the equations to U and 
L as y, = a--bn and Yn = —a— bn, respectively, where 


2log ((1— /)/a) 
“= log (1 - 6)) (8) 


2log {30741 0,3). 


and 
log {8,/(1 — 0,)} 


The requirements (a) and (b) will both be satisfied (apart from neglect of the overshoot of 
the boundaries) if we choose N to satisfy (b). : : : 

Now, for sequential sampling from a binomial population with any boundaries, the 
probability of reaching a particular boundary point can be calculated exactly by enumera- 
tion of the number of paths from the origin, reaching that point without previously crossing 
& boundary. The required value of N could therefore be obtained by straightforward, 
although laborious, calculation of the probabilities, on H, , of reaching the Vamious boundary 
points on U. It will be useful, however, to have a manageable approximation to the value 
of N, and this can be found by applying to the binomial random walk the diffusion approxi- 
mation already used in $2. 

We have y, — » 2;, where the x; are independent variates taking values + 1 and — 1 with 

m7 i 
i-1 à ; 
probabilities 0 and 1— 0 respectively. The random walk will therefore be approximated 
by a diffusion process with drift j/— b per unit time, growth in variance at a rate o?, and an 


absorbing barrier at a, where 


4.5 20—1 and o? = 40(1—0). 


Requirement (b) will be approximately satisfied then, if (from (1) and the subsequent 


discussion) we ensure that 


a—m,N| _ Pera) Cm 1 (9) 
perm) m Corn 
Where a is given by (8), of = 40,(1—0,), 
2In (1/04) 
pi m= AT RB - 69) 


2-2 
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Table 5 gives the solution N = N; of (9), and the values of a and b, for x = 0-025, f = 005 
i 7: of 0,. o 4 
ug ——a problem considered in § 2-2, the solution tabulated in meet 
provides only one member of a family of procedures approximately oatisfyitig f pol 
(The discreteness of the variate x prevents an exact solution being found pr panene JJ ; wi 
member of the family is a fixed-sample-size procedure with V = M, say. The choice a 2 
neighbouring values of N, each providing approximations to a and // may not be «i 
in practice suitable values may be obtained from the following normal approximation: 


M = (010 —0) uy + bug}? /(O1— V, 
where, as in § 2-2, Plu,)=1-a, F(u;-—1-f. 


Values of M are given in Table 5. Values of N,/N, are shown in the last column of Table ?: 


H 26Q ai sonny D ae ight 
for 0, near 0:5, the ratio is close to the normal equivalent 1-368 given in Table 2, as mig 
be expected. 


$ 5 P "mmm 
Table 5. Procedure for comparison of two proportions. Parameters of boundaries; pedi 
number of untied pairs, N, ; with nearest boundary point at N 1 and equivalent fixed-samp 
size, Ny; for a = 0-025, 2 = 0-05 and various values of 0, 


Parameters of Maximum number 
boundaries of untied pairs | Equivalent " 
0, fixed-sample- N,/No 
v size, Ny 
a b N, Ni 
| a 
‘ 0-55 36-25 | 0-0501 | 1778 1778 1294 1:37 
0-60 1794 | 0-1007 438 439 319 1:37 
065 | 11-75 0:1524 192 191 138 1:39 
0-70 8-59 | 0-2058 105 104 15 140 | 
075 | 6-62 0-2619 65 06 | 46 | 1-41 
0-80: 5:25 | 0-3219 Bo | 44 | 30 1:48 
0-85 419 | 03882 30 30 20 1.50 
0-90 3-31 | 0-4650 21 22 TI 1:50 
0-95 | 2-47 | 0-5640 14 | 15 9 1:56 


i ' . m 
The outer boundaries U and L can be crossed only at a discrete set {N} of values of t 
and it would seem reasonable to place M atn = N! 


1: Where N1 is the member of (A near? 
to N,. Values of N{ are given in Table 5. 

A second modification is to replace M by 
Fig. 2. Any path crossing M’ 
the replacement of M by 
point on U or L, whereas t 


T 
the wedge-shaped boundary M’ illustrate? j 
must also, if continued, cross M rather than U or L. H m 
M' does not affect the probabilities of. reaching any bou" j 
he average sample number for > 


any 0 must be reduced. 
3:3. As in the similar problem considered in §2 


e 
i , . rane 
PROD) 3, we might wish to specify in adv? wê 
the maximum number of untied pairs, N, in addition to the probability æ. As in $23 
could generate a family of procedures satisfying these requirements by solving (9) for 
for any chosen value of /. This has not been carried out, but for x = 0-025 one member 


Oy 
f 
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the family (corresponding to / = 0-05) can be obtained from Table 5, by interpolating for 
the specified value of N in the column headed N,. 

It would perhaps be possible to prove results analogous to those in $ 2-4, but such results 
have not yet been obtained. 


3-4. The theoretical basis proposed above, for restricted sequential procedures to com- 
pare binomial variates, involves a number of approximations: the neglect of the effect; of 
one outer boundary on the probabilities of reaching the other; the use of diffusion theory 
to represent a discrete sampling process; and the application of normal distribution theory 
to a problem involving a binomial variate. As a check on the validity of the theory, some 
exact probabilities have been caleulated for the procedure tabulated in Table 5, with 
æ = 0-025, f = 0-05, 0, = 0-8, N, = 44. The boundaries, including the modified middle 
boundary of M', are shown in Fig. 2. The probabilities of reaching the various boundary 
points, for 0 = 0-5 and 0:8, are given in Table 6. 


20 


o 


Number of 
untied pairs, n 


Yn, number of FS pairs minus number of SF pairs 


Fig. 2. Comparison of two proportions: boundaries for procedure with æ = 0-025, J = 0-05, 
Uv 0, — 0:8 and N — 44. Boundary points shown by circles. 


The exact probabilities were caleulated by enumeration of paths (Stockman & Armitage, 


1946), for various values of 0. Asin (9), the diffusion approximation gives, forthe probability 


of crossing U before n = 7’, 


a, —mn' X (2am —a-—mn' 
1-7 =m) sexe (Fe) P| em ) "m 


a = 5:248, o? -— 40(1—0), and m = 20— 1:3219. 


Where 
Table 6 compares the theoretical and actual probability distributions on the thirteen 
boundary points of U, for 0 = 0-8 and 0-5. The cumulative distributions are shown in 
Fig. 3. Tn these comparisons the theoretical probability of crossing between two adjacent 


22 
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boundary points nj and n, is taken as an approximation to the actual puis e 
higher value, n}; the cumulative distributions therefore give the Tscedison a a 
probabilities of crossing U in not more than n’ observations, for various n f The agr : we 
seems to be fairly good for 0 = 0-8, although there is a notioesible excess of ipe es. 
exact values for 0 = 0-5. This is to some extent the effect (familiar in Wald's procedu 


: x z — 0-80. 

Table 6. Procedure for comparison of two proportions, with a = 0-025, f= 0-05 and 0, = 0 di 
Probability distribution on each boundary, for 0 = 0-5 and 0-8, together with the theoretic 
approximation to the distribution on the upper boundary, U, given by (10) 


Probability of crossing boundary at n’th untied pair 


—— 
= 
0-205 0-08 
m— 
[d Upper and lower boundaries Upper boundary Tower 
| boundary 
1 | (exact) 
Exact Theoretical Exact Theoretical | | 
8 0-00391 0-00871 0-1078 0.1306 | — 00500 
11 0-00391 0-00565 0:1718 0-1515 | 00 
14 0-00317 0-00457 0:1429 0-1454 00 
17 0-00244 0-00351 0-1126 0-1230 00 
20 0-00185 0-00266 0-0873 0-0987 00 
23 0-00140 0-00202 0:0675 0-0774 00 
26 0-00106 0-00154 0:0523 0-0600 00 
29 0-00080 0-00118 0-0407 0-0464 00 
32 0-00061 0-00091 0-0318 0-0358 00 
35 0.00047 — | — 0-00070 0-0250 0:0276 00 
38 0.00036 | 000054 00197 | 09913 01 
4l 0.00028 — | — 0.00043 0-0156 0-0165 07 
44 0.00022. | — 0.00034 0-0124 0-0129 0-05 26 
Peles |e = 
Total, n’<44 0-02047 0-03275 0-9473 0-9532 0:08 34 
Él - 
a — 
Middle boundary Middle boundary 
(exact) (exact) 
| 
39-44 0-0090 
33-38 0.0824) "PPer 9:0270 
27-32 0-3108 section 0:0197 
26 0-1546 0-0053 
27-32 0-3108 0:0005 
29.33 T | lower 0-0001 
39-44 à Section 0-0000 
(0090 0-0000 
| d 
Total, 
26«n'«44 0-9590 


0-0526 
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of the overshooting of the boundaries, which ensures that at each boundary point the 
likelihood ratio of H, to H, exceeds the nominal value of (1—/)/a.* 

The total probabilities, theoretical and actual, of reaching U for various values of 0 are 
shown in Table 7. The agreement is again fairly good. Table 7 gives also the exact average 


0:05 1-0 

0-04 08 
> 0:03 06 
3 
8 
9 
= 002|- 04 

0-01 02 

0 10 20 , 30 40 0 10 20 P 30 40 
n 
(a)O=05 (b) 6-08 


25, f = 0-05, 0, = 0-8 and N = 44. 
n’ untied pairs, (a) when 0 = 0-5 
and the continuous function the 


Fig. 3. Comparison of two proportions: procedure with a = 0-0: 
Probability of reaching upper boundary with not more than 
and (b) when 0 — 0-8. The step function shows exact values, 
theoretical diffusion approximation. 


"Table 7. Procedure for comparison of two proportions, with a = 0-025, 2 = 0-05, 0, = 0:80, 
and N = 44. Exact and theoretical probabilities of reaching U, and exact average and 
variance of sample number, for various values of 0 


E Exact distribution of sample 
Probability of reaching U uibs (unsiéd pairs) 
0 
Exact Theoretical Mean Variance 
0-50 0-0205 0:0327 pU 19-83 
0-60 0-1490 0-1850 30-0 46-06 
0:70 0:5623 0-5860 27-90 115-55 
0-80 0-9473 0-9532 18-81 97-21 
0-85 0:9941 0-9965 14:46 49-42 
0-90 0-9999 1:0000 11-43 18-80 
0-95 1-0000 1-0000 9-41 5:57 
| 
i 


and variance of the sample number for various values of 0; these relate to the procedure 
with the modified boundary, M’. The ASN curve is shown in Fig. 4, together with the 
approximate ASN curve for a two-sided modification of a Wald sequential procedure with 
the same nominal values of «, £ and 4;, and the equivalent fixed-sample-size of M = 30. 

* A very much better approximation to the probabilitie 


likelihood ratio of H, to H at each boundary point, an 
0 = 0:8 by this ratio. 


s for 0 — 0:5, is obtained by calculating the 
d dividing the theoretical probability for 
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(In the fixed-sample-size procedure with M = 30 the boundaries U, M and L correspond 
to the results n, >21, 10<n,<20, and n, <9, respectively; the probabilities of reaching 
U on H, and H, are then 0-021 and 0-939, which agree reasonably well with the corresponding 
values in Table 7.) 


3-5. The procedures described above may be used for other situations in which repeated 
qualitative comparisons are made between two treatments. For example, two medical 
treatments may be placed in order of effectiveness for each of a number of subjects. An 
‘untied pair’ would then correspond to a subject giving a definite preference for one or 
other treatment, and n, and n, would be the numbers of preferences in favour of the first 
and second treatments. 


DS] 
© 


Average sample number 


0 
02 04 0:6 08 T0 
Fig. 4. Comparison of two r e T 
of 0, for three procida iih nn average sample number (ASN) of untied pairs as a functi? q 
7 : ith approximately æ = 0-025 = 0-05 p - "PC 
Wald scheme (Armitage, 1954; ASN for Fac » B= 0-05, 0 = 0-8, —.—.- s Twat ag 


5 from de Bo 
d scheme (Wald, 1947)) 


approximated by formula f, -si 
cedure with N D as 


J A J ; 
er’s (1953) method, other valt 
= 44 (exact values). 


. pO" 
Restricted sequential I 
re with N, = 30. 


Fixed sample-size procedur 


could theoretically be obtained : 
sequential t-test (N 


ample size on sequential ¢ boundaries. d 


Or E istributi , 
m a normal distribution with mean /! ' 
d on the statistic 
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Suppose that a restricted sequential procedure is required, with an upper boundary U and 
a terminating boundary M (at n = N), satisfying the conditions: 

(a) on the hypothesis H, that x = 0, the probability that sampling will end on U is 2a; 

(b) onthe hypothesis H,[H_]that jo = A > 0 [x/r = — A< 0], the probability of reaching 
U is 1— f. 

If U is chosen to be the upper boundary of a sequential t-test, with risks 22 and f and the 
appropriate value for A, the probabilities required by (a) and (b) will be approximately in 
the correct ratio. In the absence of any sound method of determining N, one might determine 
the size of the equivalent fixed sample, say No, and conjecture that N/N, takes approxi- 
mately the same value as when o? is known. This value is given in terms ofa and fin Table 2. 


5. SIGNIFICANCE TESTING AND ESTIMATION 


In the literature on sequential tests of significance, it has almost invariably been assumed 
that the object of a significance test is to state whether or not a null hypothesis is disproved 
at some fixed level of probability (say, 5%). In fixed-sample-size tests, however, it is 
usually regarded as more useful to indicate the exact probability at which a particular 
result is just significant; that is, the probability of obtaining, in repeated sampling under 
specified conditions, a result at least as extreme (in some sense) as that observed. There 
seems no reason why this practice should not be followed with sequential experimentation, 
provided that the boundary points can be ordered, in some reasonable way, in terms of the 
apparent deviation from the null hypothesis. : 

In the type of sequential procedure considered in this paper, such án ordering can be 
made with little ambiguity. Suppose that the procedure described in $2 is used to test the 
null hypothesis that x = 0. If there were no overshooting of the boundaries the sample 
mean, 3, when the procedure terminated, would inerease monotonically as the boundary 
point moved round the boundaries from low to high values of n on L, along M, and from high 
to low values of n on U. If | | is used as an estimate of departure from the null hypothesis, 
the probability level corresponding to any boundary point, ona two-sided significance test, 
will be the probability (on the null hypothesis) of reaching boundary points with higher 
values of | 3 | than that observed. , 

Some ambiguity is caused by the discrete steps in n, but it would be easy to Taria 
reasonable ordering rules. For example, if the procedure terminated ab a sample size n’, 
the possible pairs of (n’,) could be arranged in decreasing order of Bip ean as as follows: 
(i) all results with n' = 1, in decreasing order of | |: (ii) all results with n = 2,in decreasing 
order of |%|; and so on. Alternatively, results could be arranged in order of | z | without 
reference to n’. TEM " 

In the binomial procedures of $3, also, slight ambiguity is caused by the overshooting 
of the boundaries, so that the proportion of FS pairs at the termination, 0, may not always 
increase monotonically as the boundary point moves from one extreme to the other. The 


simplest rule here would be to arrange the points in terms of the dental order round the 
ries, smaller values of n would represent more 
ries, 


boundaries. Thus, on the outer bounda bs 
extreme results, and on the modified middle boundary the results would be ordered 
according to |ô- i|. As is shown by Table 6, the diffusion theory may not provide an 
adequate approximation to the required probability, and it would be preferable to calculate 


it exactly. 
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This approach may be applied to other types of sequential procedure, in spas d E. 
Wald sequential tests, provided that the probabilities, on the null hypothesis to be 
of reaching the individual boundary points are known. : Mm ^— 
If these probability distributions are known it is possible in principle to a 
properties of point estimators of unknown parameters, and to formulate rules foro us re 
confidence intervals. In general, intervals obtained from the usual fixed-samp e-s 1 
formulae will not yield the nominal confidence coefficients in repeated sampling with sequel 


s s : P " "Ocedures 
tial stopping rules. Some numerical investigations on a number of sequential procedu 
for binomial variates will be reported in another paper. 


Iam indebted to Miss I. Allen for computational help. 
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ON THEORETICAL MODELS FOR COMPETITIVE AND 
PREDATORY BIOLOGICAL SYSTEMS 


By M. S. BARTLETT 


University of Manchester 


l. PREAMBLE* 


Differences of opinion are obviously possible on the degree to which admittedly over- 
simplified theoretical models can explain some of the complex observational phenomena 
to be found in nature. Criticisms from biologists of the mathematical work of Lotka (1925), 
Volterra (1926) and subsequent writers on the growth and interaction of biological popula- 
tions have, however, sometimes been justified and sometimes unjustified, for in spite of 
inevitable limitations such work constitutes a permanent contribution to the understanding 
of how populations may behave. A significant constructive survey was made by Gause 
(1934) when he attempted to bridge the gap between theoretical models and natural bio- 
logical phenomena by controlled laboratory experiments in animal ecology. While experi- 
ments also have limitations as representations of nature, the role of both theory and 
experiment in the physical sciences might be recalled by any biologists inclined to be 
sceptical of the value of either. 

The interrelation of these approaches in biology may be illustrated in the field of epi- 
demiology. Here the vicissitudes of infected populations have been studied in the laboratory 
as well as in the field; but, as I have emphasized elsewhere (Bartlett, 1956, 1957), the pro- 
perties of theoretical models indicate, among other things, the extent to which population 
size may sometimes be crucial in the probable sequence of events, and thus indicate to what 
extent laboratory observations will have any similarity to larger-scale field observations 
even if the same model applies to both. An essential point is that recent theoretical formula- 
tions explicitly recognize the discrete character of populations and the stochastic or random 
aspect of changes, as distinct from strictly deterministic formulations. The need for this in 
ecology, which was already perhaps envisaged by Gause (1934, p. 124), is quite apparent 
in the experiments by Park with the flour-beetle Tribolium, in which one of two species 
together in a container survived not every time, but with a definite probability (e.g. 30% 
of times), that could be estimated by replication and changed by changing the environment 
(see, for example, Neyman, Park & Scott, 1956). 

There is now no mathematical difficulty in the fi ormulation of stochastic models (see, for 
example, Bartlett, 1955a, 1956), and such formulations have already been made for typical 
ecological models by Chin Long Chiang (see Kempthorne et al. 1954). The greater intract- 
ability of even the simplest of these is, however, ® serious obstacle to progress, especially 
in animal ecology, where even in the deterministic formulations of Lotka and Volterra many 
simplifications, such as neglect of age structure or of other heterogeneity, were made. One 


aim of the ensuing discussion is to indicate t 
of population dynamies when properly interpreti 


he enhanced value of deterministic formulations 
ed within more comprehensive stochastic 


included in a survey paper, *Some applications of 


* Some of these introduetory comments were also : 3 
t the Third Soviet Mathematical Congress held in 


probability and statistical theory in biology’, given a 
Moscow in 1956. 
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i i "asi species, 
models. After some remarks on the logistic model of population growth for a single d 
two problems are referred to: (i) the classical Lotka-Volterra predator-prey rela -i 
(ii) competition between two species, with special reference to the competition bats "E 
the two species of flour-beetle, Tribolium confusum and Tribolium castaneum, as investiga 
by Park. 


2. THE LOGISTIC MODEL 


" r ewartha 
The properties of the logistic law of growth, introduced by Verhulst (see, Andrew X 
& Birch, 1954) and much discussed in the literature, need not be recapitulated here. In 
deterministic form, the rate of increase of a population of size N is assumed to be 


DN —aN—fN?, (u 


" r 4 is 
where D=d/dt denotes differentiation with respect to the time £. It is well known that wh - 
simple equation ignores deterministic oscillations arising from age structure, and e 
complete discussion it would be advisable to investigate the amplitude of oscillations 


; d > "Ar the 
stochastic formulations, due to such a complication, along the lines to be developed for t 


simpler case above. However, the main point I wish to make is that the rate of inerent 
given in (1) is a net balance of births and deaths, and many different stochastic mode? 
compatible with (1) are possible (cf. Kendall, 1949). In the extreme ease deaths are neg : 4 
gible compared with births, and the chance of a birth in time ôt (independently of pee 
events) will be assumed to be (ZN — PN?) dt + 0(6t).* The difference between this stochast! 

model and (1) has been already studied in some detail (see Feller, 1939: Bailey, 1950) 
The asymptotic value for the population size is in this case still fixed at alp. But if P 


assume that the chance of a birth in time dt is (e N — f, N?) dt and of a death (a,N + fa asy 
then we have four coefficients connected by the relations 


2 
Oya =a, Bith = p: a 
In a small interval ôt, we may write 


BN = (aN — BN?) 914-02, — aZ, É 
Where, as the first term on the right-hand side of (3) 
random or stochastic change ôN (which can only be 
dependent (modified) Poisson variables with zero m 
(aN + pN?) ôt respectively, To illustrate the use of 
Situation when N ~alf; put, more precisely, N 


ee 
represents the systematic part of + 
Oor 1 as d¢+0), 9Z, and 6Z, ye "i 
cans, and variances (y, N — {iN _ 
equation (3), consider the asymp - 
= &(14-wu)/fl. Then 

ôu = —a(1+u) uót tf(0Z,— 9Z.)la 
This non-linear stochastic equation can be approximated fo 


r small v by 

ôu = —audt+dZ, w 
where ôZ has variance To 4-23) Blox + f3— [1t = 
quantity u at time t has the same statistical properti 
this gives for the variance o? 
obtained from (5), 


tbe 
gt 
ju 


y ot, say. In any ‘steady state 

es as the quantity u + du at time tt 
nd averaging the equation for «Y 
0, -—9*(1 — 2201) 4- y 8t, 

probability assumption which for large 


- . ter 
N is equivalent to (1). The last Dn 
suc! y ] To 
) ; e er R such that R/dt+0 as ôt +0, is inserted for stri jr ( D) bot BF 
venience omitted in Subsequent formulae. omnet rigour, b 


of u, on Squaring a 


* This is the simplest 


m 
(8t), denoting a remaind j 
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whence o? = }y/a. Strictly. as the value N = 0 represents an ‘absorbing state' (and the 
value N = x,//^, an upper limit which cannot be exceeded), it follows from the theory of 
finite Markov chains that the ultimate state is N — 0, provided this state is accessible from 
any other admissible value of N (i.e. z; and f^, positive). However, it seems evident from the 
result above for g2 that the chance of extinction, once the value of N isin the neighbourhood 
of the value «//, may be neglected for any given time interval, provided y/« is sufficiently 
small. Under such conditions the population will thus, in contrast with the pure birth 
process, continue to show fluctuations with this variance. Clearly, before persistent 
fluctuations observed in real animal populations are considered incompatible with the 
logistie model. their size and characteristies should be compared with fluctuations pre- 
dicted from stochastic models like the above. It is essential for this comparison, in the reporting 
of laboratory experiments, that individual replications be recorded separately, and also that the 
total size of the population (rather than its density) be noted. 

Tn the discussion of the logistic curve given by Andrewartha & Birch (1954), oscillatory 
fluctuations in the total population are depicted from experiments on Tribolium, the weevil 
Calandra oryzae and the cladoceran genus Daphnia magna: and the effect of a more complex 
life history than assumed for the logistic model is, as the authors note, a possible con- 
tributory cause of these. The existence of damped oscillations in a deterministic model with 
an age structure has been noted by Leslie (1948), and the further important point is that in 
a stochastic model such oscillations can maintain themselves even in the steady state, with 
an amplitude depending on the population size. In the simple one age-group case above, the 
fluctuations have no true oscillatory character; in à multi-stage population, oscillatory 
tendencies arise if the roots determining the behaviour of small fluctuations about the 
steady state are complex-valued (such investigation of the nature of fluctuations is illu- 
strated below in the two further problems to be discussed). In the case of bisexual repro- 
duction, the further complication of unequal numbers of the two sexes should also strictly 
be taken into account. 


3. THE CLASSICAL PREY-PREDATOR SYSTEM 


The simplest deterministic Lotka-Volterra equations for a prey-predator system are 


DH = (a, — f, P) B, | (6) 
DP = (-a3 fH) P. 

where H denotes the population size of the ‘prey’ (or ‘hosts’), and P the size of the "Bre: 

dators’ (or *parasites?). When I refer to these equations as classical, no implication is in- 

tended that they represent precisely amy real biological systems. They are important in 


representing the simplest theoretical model of prey-predator interaction that can be 
specified, and as a prelude to further discussion their main consequences are recalled. 


From (6), dH -APH 


whence by integration 


f(H,P)= —a,log H+ p-a log P+ f. P 


es. The equilibrium point is given by Py = &1/P1 
that there is no damping towards this point if 


= constant. (7) 


The curves represented by (7) are closed cycl 
Hy, = &|ß» but this is neutral, in the sense 
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the system is at any other point on the H, P graph. For small cycles around (Mp. X), it 18 
easily shown that the path is the ellipse 


8 
Xh? -- o p? = constant, (8) 


where H = Hj(1+h), P = Pj(1+p). For larger cycles, the path is restricted by fes e. 
H = 0, P = 0, but it is obvious that any stochastic formulation will lead to eee j w 
which will be especially important near either H = 0 or P = 0, for if either H or J per 
zero the oscillatory character of the system disappears. Thus, while modified — 
formulations, either by way of more complicated assumptions in models like (6). OF = 
A. J. Nicholson and V. A. Bailey; see, for example, Bach & Smith, 1941) in terms of discre A 
generations, have led to unstable systems, we see that even (6), when incorporated into à 
stochastic model, is ultimately unstable, for the drift due to stochastic fluctuations will or 
sooner or later to the total extinction of the predator species, either (i) before the othe 
species or (ii) by starvation due to the extinction of the prey first. "M 
To illustrate this point, a stochastic model was set up compatible with (6). There is * 
same difficulty as with the logistic model that many different birth- and death-rates M. 
consistent with the same net rates of increase of H and P, but (corresponding roughly wit 
the conditions in some actual situations) x, may be interpreted for simplicity as a i. 
birth-rate for H, and % aS a pure death-rate for P in the absence of H. The relation of th 
second term £, H P in the Second equation with the death-rate PP for H is not so immediat? 
as in the case of epidemic theory, where susceptibles when in contact with infecteds p. 
into more infecteds, but will be assumed to be a consequence of an increase f, H in tb 
birth-rate for the predators. (The precise assumptions made are given later below.) . 4 
It should be added at this s hese simplified assumptioP 


are likely to be even less realistic in the case of the type of system now under consideratio! 
than in, say, epidemic theory, for the basic new 


qug ID 
and spatial variability E. 
); but this will not be c? 


The constants Chosen for the illustrative stochastic model were 


Sidered here, 


w= £,=01, Gs = 0-5, Po = 0-02, 


= 25. An artificial realization was started with H = 25, P = 2, and 
d ‘Monte Carlo’ techni f., fi 4 131 
nique (cf., for example, Bartlett, 1955, py 


leading to P, = 10, H, 
developed* by standar 


. * To correspond to f/f, = $ the predator Population was assumed to 
in five times that a prey was exterminated. The pre 


pe 
j nly ? 
n Inerease by one for o «er? 
cise as 


thus: H, P to H+1, P E transition probabilities assume 
E. Pid 1Pà 
H-1,P 0-08H 6 
H-1, P4] 0-02H P ôt 


| M. S. BARTLETT 31 


| and less well-defined cycle and then to final extinction of P. Qualitatively, this series 
shows many of the characteristics that have been observed in small-scale laboratory 
experiments—variation in amplitude of the cycle, possible extinction first of H (and then P), 
or of P directly (and subsequent unlimited increase of H). 

A quantitative comparison with suitable laboratory data would be useful, but there are 
at present difficulties in any very precise comparisons. The first are theoretical, and arise 
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Fig. l. An artificial realization of a stochastie model for the prey-predator relation. 


20 30 

g ie Predators . : 

Fig. 2. Prey-predator ‘cycles’ for the artificial stochastic series. The closed 
deterministic cycle is also shown (the dotted curve) for comparison. 
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from the mathematical intractability of a complete solution even of these —— m 
models (it is for this reason that the construction of artificial series is particularly d 
tive). The second arise from the extra complications present in real animal po d 
'Thus a striking and somewhat unique example of oscillations obtained by oou ( d 
in which the prey were yeast cells Saccharomyces exiguus and the predator Pages i 
aurelia, has been criticized by Andrewartha & Birch (1954, p.440) because, for instance, : 
thetendency in such experiments for sedimentation of the yeast at the bottom ofthe da^ 
While we should bear in mind such objections, we might notice that the data in this "— 1 
ment appear reasonably consistent with the above simple theory. Thus, from the alte e 
by Gause, we have a,~1, 4470-45, and since from the graph it appears that A~ . ^ 
H~ 1:5 x 107, we should conclude that / «0-01, //, 3 x 10-5. From the theory of (sma a 
oscillations about (Ho, Py), the period of a cycle, which is approximately 27/4 (2 2a): 

calculated to be 9-4 days, a ‘prediction’ in quite fair agreement with the observed periot 
of about 8 days.* Moreover, while the ultimate extinction of the predator on the stochasti 


iie: usse 
model has been noted, the chances of extinction after one or two cycles may be relativ ely 
small (as in the artificial series above), or even microscopically sm 
spite of the absence of complete solutions, it is known in e 


that such chances of extinction depend v 


all in certain cases. In 
pidemic theory (Bartlett, 1956) 
ery critically on the magnitudes of the coeflicien® 
occurring in the equations. An approximate device employed in such a context for assessind 
the order of magnitude of the chances of extinction may be 
It consists in neglecting some of the variation in the numbers of the second species when the 
number of the first species is low and liable to extinction, Thus if in the example unde! 
discussion we note that H is large and thus on the theoretical model fairly safe for survive 

(any real ‘sedimentation effect’ would support this assumption though a strong dependent? 
between individuals of large colonies would not), we need merely consider the chance A 
extinction of P at the bottom ofits cycle. At this point H ~ IH, and the birth- and death-rat® 

for P are equal. If this situation could maintain itself, P would ultimately become extin? 
(sce, for example, Bartlett, 1955, p. 71). Thefact that, for low P, His increasing will, howeve”’ 
modify such a certainty. If we put roughly for the effective birth-rate a, +csin (2at i^ 
where T is the period and t is reckone i 


d from the point of low P, then from known theo 
(Kendall, 1948) the chance of extinction is given by (14-1 [JYP ~e-Pl where 
t 
J =| Ay CP) Jar, 
0 


à , 30* 
used in the present context. als 


pte) = - [Sesin (2y) dy = (Jen) fos (2/7) — 1), 
This gives, if b = iT[m, 


J= a f eheicosizi)-1) qy, 

0 
This integral may be evaluated as 
particular interest to us for t of t} 
of extinction of P must be a st 
estimate its value if we choose f 


, , "PT. 
a series of Bessel functions for general t, but while it P 

"^ 14-3 s D 
he order of 17’, it is convenient to remember that the ch27 
rictly Increasing functior 


n of £ and so we shall not U” 
to be larger than this. say T. Vor this value of t 


es My. adio 
J= Ta, ete [,(bc), 
* The ‘observed? period for the 


il 
à „bi 
mock series (Fig. 1) suggest 7 ; may ” 
approximately be applied even for quite large oscillatio. Egeats that the theoretical formula 


ns. 
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where /,(z) is the Bessel function of zero order (of the first kind, and of imaginary argument). 
By inspection of the graph given by Gause, c is about $z,. Hence, with a = 0-45 and 
T = 9-4, it is found that J = 2-6, and thus when P ~ 15, the chance of extinction during the 
critical part of the cycle is of order e~>7 ~0-003.* Thus while it has been suggested by 
Andrewartha & Birch (1954, p. 440) that Paramecium avoid extinction by growing smaller 
when no food is available, these results seem at least reasonably compatible with the simple 
mathematical model. 

The relevance of these extinction probabilities in ecology would be more convincing if 
further data were available where extinction did in fact occur, and calculation gave a high 
chance of extinction in such cases. First of all it may be checked that this chance is high 
for the artificial series of Fig. 1. Using the deterministic solution to caleulate the order of 
magnitude of low H or low P after one cycle, with the given initial conditions, the chance of 
extinction per cycle was calculated to be of the order 0-6 for P and 0-3 for H. However, 
a corresponding calculation for the experimental data on mites reported by Gause, Smarag- 
dova & Witt (1936) gave a very small chance of extinction per cycle, in contrast with the 
invariable extinction actually observed. The specific caleulation was made for the following 
series (in wheat flour): 


Days 0 6 12 23 27 32 35 38 — 41 44 47 

Prey 50 24 28 256 408 496 288 32 — 2 — 
(Aleuroglyphus agilis) 

Predators 5 4 12 12 24 64 96 120 44 24 8 


(Cheyletus eruditus) 


The numbers are estimated total numbers at all stages of growth, excluding eggs; quoted from Gause 
et al. (1936). 


a ~ 0-08, a ~ 03, fı ~ 0-004, a~ 0:003) of the coefficients 


Only very crude estimates ( 9 
ubt from these and similar data that the 


were made from these data, as there appears no do toman 
model is inappropriate because the true cycle is deterministically unstable. The effect of a 


time lag in the growth of new individuals is one possible explanation of this, and is examined 


further in the next section. 


4. THE EFFECT OF A LAG IN BIRTHS 


It is assumed that there is a lag 7, in the growth to maturity of the prey, so that (in terms of 
adults) the increases may approximately for small 7, be regarded as due to the prey present 
at time 7, previously. A lag 7, is similarly assumed for new predators. This modifies equa- 


tions (6) t 
ions (6) to DH (t) = a H(t—7,) - /P(0 H(t), ) d 
DP(t) = — a, P(t) + B,H(t — 72) P(t 72). 


In terms of the differential operator D, equations (9) can be written 


Dh = a, enP(1 4-4) a (1 +h) (I +p); } T 
D, = ~aty(1 +p) +2677? (L+h) (1 p). 


from the graph; if alternatively, as in the 


: i i J: zas taken as typical 1 : 
In this calculation the value 15 for P w Į A 


further examples, it is calculated theoretically from the initia 
chance of extinction is even smaller. 


Biom, 44 
3 
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: same 

where H = H,(1+h), P = Py(1-- p). For h, p (and 7,, 7) small, equations (10) becom 

v = Ig > 

approximately Dirne Diea À an 

Dp e ash — dTa D(h +p).J 


The character of small oscillations about H,. P, is determined by the equation in D. 
D*(Y-- 7,04) (14-7525) —a,a,(7T,D—1) = 0, 

: : "n :'oximate 
giving rise to instability (for T, > 0) even on the deterministic formulation. The approxima 
solution for h, p is found to be 

h~ (A cos 0t 4- B sin Ot) ebat, 
94 ~ (AO sin 0L — BO cos Ot) (1 + T104) esent — doas T,h, 
where 0? o o (1 — 7,0) (1 — T3&t,). Thus 
12 
o3 p*(1 — 7,04 -- Ta) e TS p + ah? ~ Cetera! (14 
where C 2 oA? 4- B?) is a constant. 
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requiring a rather more elaborate study of age structure, would have to allow for such 
stochastic fluctuations. The immediately relevant point, however, is that an effectively 
zero value of H at the end of the cycle follows from this slight modification of the model. 


5. THE EFFECT OF IMMIGRATION 


The stochastic instability of even the classical prey-predator model (6) naturally suggests 
that the theoretical effect of immigration should be considered (ef. Bartlett, 1956). In simple 
epidemic models of the mixed susceptible and infected population (with infection by con- 
tagion) it is well known that epidemics cannot recur unless the susceptibles are renewed. 
A more difficult point is to what extent renewal of infecteds from outside is necessary, but 
recurrence is in any case made certain by such renewals. In the prey-predator system, the 
immigration of either prey or predator will prevent absorption at the corresponding axis 
H = 0or P = 0, but absorption at the other is still possible. If, however, damping terms are 
introduced into the deterministic model by such immigration, extinction may be greatly 
delayed. Finally, if immigration of both prey and predators is allowed, it is obvious that 
extinction is completely prevented. 

It was consequently interesting to find that Gause (1934) had introduced immigration 
experimentally into his laboratory cultures, to avoid the extinction difficulty. His theo- 
retical reasons seemed, however, rather ad hoc and incomplete, for we have seen that with 
small laboratory populations extinction would not, even in the absence of further com- 
plications, be necessarily unexpected on an explicit stochastic formulation. 

With immigration, equations (6) are modified to 


DH = (%—f,P)H +4, } (13) 
DP = (—&+ f H) P t €. 
The deterministic equilibrium point is now given by (Ho, P), where 
Pos P3 — PB 6s + Boer 0503) 0363 = 0. (14) 
There is one relevant positive root, as 


(f. 65 +261 + 0503)? > 404 053 695 , 
for small e,, €p, 
95 Efa Hos ay Caf $ 
Arg. Aon" 9 fh Bo 
For small oscillations h, p, where H = H,(1+h), P=P(1+p), 


Dh~ NUN S A (15) 
Dp~ — pes [filo 4 f Hoh, 
and the equation in D, 
D? + D(ey f]ots + 6s f]oa) +f fs HP, 0, 
implies damping provided c, and/or e, 0. . 
The deterministic damping operates at all amplitudes, 
function (cf. equation (7)) 


F(H, P) = £,(P — P, log P) + 4H — Hi log E), 


at least for small e, ¢’. For if the 
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is differentiated, it is found that 


pr = - [S£ (fH — as 2 (JP — 2) + OG. e). 


la H 1 | 1 
. , : " [t 
so that F always decreases in time until it reaches its minimum value in the neighbourhoo 
of (2/5, &,[[/,), at the point (Hy, Fo). m 
In view of the instability found in $ 4 due to the lag from birth to reproductive age, 3 
advisable to combine the two effects of such lags and immigration. Since the effects - 
additive to the first order (for oscillations of small amplitude), it is readily seen that dete 
ministic damping will still be maintained provided 
6 
€ Hy + 65 D > ty HT. ae 
1 ` ; jon? 
As deterministic damping for small €}, ¢, may be only slight, the amplitude of —, 
may evidently be considerable even when a stochastic ‘steady state’ (or true stationarlt? 
is feasible. 
6. COMPETITION BETWEEN TWO SPECIES 
In the generalization of equation (1) to two species, the simplest model is represented "m 
the two equations DN, = (a, — f N, — f Na) aj 
DN, = (25 — fo, Ny — Boo No) Nos 
where N, and N, refer to the two species. A further simplification is obtained if it is osim 
that the restrictive action of the populations is sufficiently well represented by a combi. 
population size N =N, +AN,, so that Pill = Poolbo = À (seo, for example, Kostit? 
1939, p. 124). Equations (17) then provide equations* 
DN, = («4— fl, N) a 
DN, = (%— pa N) Ny, 


01) 


where f, = fis ABs = Boo, whence 


D( fy log N, — 2, log N,) = 933 — dafis o 
NEING = (nds[nfh) debant, to 

è ege P 

= so Ng are the initial values of N, and Nj. This result implies that N, or Ns tend je 
Pete ing as By is less than or greater than 95/1, and the remaining species obey? gti? 
single ogistie equation (1). This deterministic result is unlikely to be affected by stoch™ y 
fluctuations unless a, D^ a 


if? 

t H H H P, 

e i but in this case they may sometimes be important. we me 
ticular, suppose the two species 
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pe? 


and with no deterministic damping in ^", paf 
lis di tial s ping "T. 
stochastic drift will take place until either N, or Ny is zero. This point is noted because ! } 


= 
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N,= 94 [fog — HBr» 
Pures —PreBoy’ 
* It should be noted that (in contradiction of 
there is no logical inconsistency 
are a balance of births and deat 
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which is found to be stable if 
XH Po2> Zapi» Arbor X Py 
and unstable if &i fgg. Cala s Pa 


TThe last case is interesting. as it implies on the deterministic model that which species 
survives depends on the initial conditions; hence in a complete stochastic model random 
variation at the beginning of population growth might be a crucial factor in the survival 
of one or other species.* 

The Tribolium model assumed follows the lines suggested by Neyman ef al. (1956), 
where the competition between the two species, 7’. castaneum and T. confusum, is inter- 
preted more as a predator-prey interaction than a direct competition for food. The four 
stages of growth are for simplicity condensed into two, referred to as ‘passive’ and ‘active’ 
(the adult stage), and the control of numbers arises apparently from a cannibalistic pro- 
pensity for the adult to consume the young or ‘passive’. This cannibalism appears to apply 
equally to the devouring of the young of either species, though the adult propensities may 
differ for the two species. 

The Neyman, Park and Scott model was formulated in terms of discrete generations. The 
general type of stochastic formulation adopted in the present paper can include such dis- 
crete generation models (cf. $4), but the essential points seem to emerge from the simplest 
stochastic formulation in continuous time of the type represented by equation (3), and 
this will be used here. The single species case is of some interest for comparison with the 
straightforward logistic model, and is considered first. Spatial ‘diffusion’ of the beetle is 
ignored, though it may be an important further factor in the actual experiments. 

The deterministic form of the model assumed is 

DP = —pAP—vP +AA,) (19) 
DA = vP—cA, J 


active, beetles, and A and e (<A) are birth- and death- 


where P is the number of passive, A of 1 Fek ; 
and x is a ‘voracity’ coefficient. The 


rates, v is the transition rate from passive to active, 
equilibrium point is B,-(-eolp A =r —e)/ (ue). (20) 


For small oscillations, put P = P(1 +p). 4 = A,(1+a). Then 
Dp = — ple va — v(A — e) aple 
T —Avple+ va, 


Da = e(p— 4); 
and the equation in D, 
D2 4 D(e4- Av[e) + (A - 9) v = 9 


indicates damped oscillations. Note that for the function 
flp.a) = p—log (1 +p) + rla -log (1 a)]/e, 
Avp(1+@) | va(ap + 2p — a) 


A e(l +p) loa a 
vp?(1+a) va(ap 4- 2p — a) va—Py — 
us l+a (1+p)(1+@) 


Tribolium was raised by Dr P. H. Leslie in the 


v ibili f such an effect occurring with 
The possibility of such an May 1956. 


discussion following a seminar I gave at Oxford on 15 
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so that f decreases in time, implying that deterministic damping operates at large as well 
as small amplitudes. l 
The exact stochastic equations assumed in place of (19) are (cf. equation (3)) 


àP = (-vP - AA - nAP)8t — 8Z, 82, — Za) 


(21) 
SA = WP —cA)0t - 8Z, 074. | 


where the variances of 0Z,, 6Z, 6Z, and 6%, are indicated by the respective deterministic 
terms pP dt, AA ôt, etc. With P = P(1--p), A = Ag(1 +4) as before, and in the case of small 
fluctuations, 
p ~ ( - Àvp[e + va) òt + (02, — 82, — 825)] Ly.) 
a~ c(p — a) dl + ($Z, — 824)| As. j 


While presumably on the above model extinction of the population will occur after a long 
enough time,* this may (for a deterministic ‘ceiling’ population not too small, but fluctua- 
tions relatively small) be so long delayed as to be negligible and an effective or quash 
stationarity be established. Under such conditions, equations (22) yield, on squaring or 
multiplying together the expressions for p +ôp, a+6a, and averaging, 


O~ —2Àva? |e + 2v cov (p, a) + (AAg + VP) + Ag P)/P3, 
O~ 2e cov (p, a) — 260} + (vP, + €A9)] AR, 


A $ 
Ow -Z cov (p.a) + va? + 07, — € cov (p, a)— vP) (Ly Ao), 


whence 2, _Avt+eter) 6, Ae) +A?) 
DPA) (vte)  * Ag(A—e) (Av 4 c?) (23) 
Ac? | 
eov Lp, d go ej Qu e) | 


When the two Tribolium species are put together, they are denoted by suffices 1 and 2» 
and the following (deterministic) equations assumed. 


DP, = — 4, PA, — ty P, AS — nP, +A, A, 
DA, = v,P,—€A,, | 


| (24) 
DP, = — i P, Ag — pu P,A4— v, P, + Àg As, | 


DA, = v4 P, — 65 Às. 


An attempt to find a single equilibrium point yields the two (in general) incompatible 
equations 


Ii [ta Vo Ay 
iE Pa. aa pi Ar = 
É T & = nto 0,| 


1 i (28) 
Jav tov Agv | 
-21 p^ Pynt *=0,| 
€ Ez m 


these representing parallel lines in the P,, P, plane (cf. the discussion in Neyman et al. 1950) F 


m , - l 
* There is strictly no stochastic upper limit to the population in contrast with the simpler mod? 
assumed in §2, though the effective limit implied by the non-linear ‘voracity’ term suggests t nat 
ultimate extinction property should still hold. i 
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Before attempting to consider the problem of stability in relation to the zone thus 
defined, it is instruetive to consider the case when the two species become identical in 
behaviour. In this case, if 

PRX Ayt+4.=Y, 
P,-P, =U, AQ-A,.-V, 


DX =—pXY-vX+AY, 

DY = »X-«Y, } (26) 
where A, = À, = A, etc., and 

DU = -pUY -vU & AY, 

DV = »U-eV. } ey 


Second species 
m 
e 


$310 20 30 4 
First species 


peting T'ribolium species in a simplified 


bers of two com 
d line is the ‘equilibrium’ line). 


Fig. 4. Graph of total num 
pte c etric case (the dotte 


stochastic model—symm 
as they must be) identical with (19), and refer 
ed, it may approximately, even in the stoch- 
aced by its equilibrium value r(A — €)/(ue). 


The first pair of equations in X and Y are ( 
merely to the total population. As Y is damp 
astic model if numbers are not too small, be rep! 


Then equations (27) become AV 
DU = d U AV, 


DV = vU ëy. ; 

wi isis : 7 = clv. Hencein the stochastic model there will be random 
(e aca, in um Paneg of one or iar 2 ord ses to bx 

his is not altogether unexpected, for these symbols now se d un Pra ; e o 
lines of the same species. The case of two similar species rier oes andin tl i 
limiting symmetrical case, for which the chance of extine ti Pede edit 
Symmetrical initial conditions. must be j. However, if in un hum: preti 
ratios A,/P, Il viles As] Fs = Vo) 6s are insert 


ed, the incom: 
(25) are replaced by DlogP,  - Ai or im (28) 
log P, = - hi - itte 
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where a, = A,vj/e; — vy, Ay = a vile; ete. a particular case of the competition equations 
17 , 

(17). From (28), Pice (PIB) = 00,0; 


and P,/P, tends to increase or decrease according as æ, is greater or less than a. Hence 
: 1 1 1 1 * > 1 1 "er 
there will be a tendency, as the differences in the characteristics of the two species increase, 


Second species 
m 
e 


0 10 20 30 40 50 60 
First species 


Fig. 5. Graph of total numbers of two competing Tribolium species in a simplified stochastic 
model—asymmetric case (the dotted line joins the equilibrium points on the two axes). 


50r 


è 
= 


w 
[2] 
—L— 


First species 
-=-= Second species 


20} 


No. of individuals 


10 


Unit time 


Fig. 6. Total numbers of individuals of the two competing specie 


; : i mous s plotted 
against time (arbitrary units) in the asymmetric stochastic 


model. 


for the superiority of one species to become more marked and lead 


more frequently to the 
survival of that species. 


A preliminary illustration of these last conclusions is depicted in Figs. 4 and 5. In th? 
first a Monte Carlo realization is shown (for the total numbers of each species 


) for the sy? 
metrical case, with the rather arbitrary values A = 2, I = 0-05, v 


l, € = 1, beginning with 
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P = A, = 10, The ‘equilibrium’ line along which stochastic drift occurs is shown dotted. 
( The corresponding time realization has not been completed, but an idea of the relative 
time elapsing before extinction of one species occurred is obtained from the number of 
steps represented in the graphs, viz. 1224 for Fig. 4 and 652 for Fig. 5.) In the case shown in 

still 2,¢, = 1), and the bias towards extinction 


Fig. 5, A, was altered to 3. and e, to 2 (with A, 
of species 1 thus created (a, > a5) Was supported by the corresponding (and more rapid) 


extinction in the realization. In this case the time realization is also shown, in Fig. 6. 
* No quantitative comparison with actual Tribolium data has been envisaged at this 
Pg dm - : ill require further inv estgution. The main object in this preliminary discussion 
sn to demonstrate qualitatively some of the observational features. Even with 
the present simple Tribolium model (adapted from that formulated by Neyman, Park and 
Scott) the times to extinction need considering further. It has been suggested above that 
the duration of any experiment is short compared with the extinction time of a single 
Species, in contrast with the extinction time of one or other of the two competing species, 
but obviousl y more definite orders of magnitude for these extinction times would be useful. 


all for some critical comments on my first draft of 


Tam very grateful to David G. Kend 
Linnert and Miss C. Caley for assistance with the 


this paper. I am also indebted to Mrs L. 


construction of the artificial realizations shown in the figures. 
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THE CONSISTENCY AND ADEQUACY OF THE POISSON-MARKOFF 
MODEL FOR DENSITY FLUCTUATIONS 


By V. T. PATIL 


University of Manchester 


UMMARY. The Poisson-Markoff or emigration-immigration process (as defined in 31-2) has been 
used as an approximate model for the number of independently moving particles present in particular 
regions of space, for example, as an approximate model in spermatozoa studies. The postulation of 


ie h a model, valid for all modes of division of the space, is shown to be inconsistent in the sense of 
Qao to contradictions, except when 7, the total interval of time considered, is small; the assump- 
ions (cf. 81-3) about the infinitesimal transition probabilities then ensure the consistency of the 


model to O(r). To study the adequacy of tho Poisson-Markoff model for some data of spermatozoa 
counts, the asymptotic y?-goodness of fit test based on serial correlations is employed. The goodness 
the spermatozoa data from the Poisson-Markoff model, 


of fit tests indicate a significant departure of 
the departure being not so striking in the uniregional case as in the multiregional case. 


Part I 

forming some type of stochastic motion in an 
infinite’ space. Let Ry Ra -o Rr (any k finite non-overlapping regions in the space) 
and the complementary set R* represent à mode of division of the space, and let 
N,, (t> 0) be the number of particles in the region R, at time t, N, being the column 
Vector with M, (r= 1.2.54) as its elements. Then the full specification of the 


Stochastic motion is sufficient to specify the process {N}. However, not only is the 
- not always be a type of stochastic motion leading 


Converse not true, but also there may 

to the specified process (Nj. In $ 1-5, it will be shown that the postulation of the Poisson- 
Markoff or emigration-immigration process for N, (cf. $ 1-2), when assumed. to be valid 
for all modes of division of the space (as has been used in connexion with spermatozoa 
Studies) leads to contradictions and hence is not compatible with any type of stochastic 


motion, 


L-1. Consider a system of particles pe 


1:2. The Poisson-Markoff or emigration-immigration process. For setting up this process, 

We start with the following assumptions: 
(1) The particles are moving independen 
no ie probability that precisely one of the 
, in time dt is N, , A,,dt + o(d!). 


_ (3) The probability that precisely one of tl 


tly of each other. 
particles in the region R, at time t moves 


he particles in R, at time ¢ moves into R* in 


time dt is N, At dt + o(dt). —-— nÀ AN. i 
R (4) The probability that precisely one of the particles in R* at time ¢ moves into R,in 

ime d4; 
dt is jt. dt + o(dt). the first order) and the distribution of N, at a single 


+ Poisson variates. This Poisson— 
process (Bartlett, 1949). The 
t. The detailed behaviour 


The process (Nj) is then Markovian (of 
Instant is, as pointed out below, that ot © 
arkoff process is also called the emigration 
be ameters A,,, A* and t will remain unspecifie 
the Poisson-Markoff process is discussed below. 


fk independen 
-immigration 
d for the momen 
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Let (i) the ‘stochastic interaction-rate’ matrix (cf. Ruben & Rothschild, 1955) 


A+ VA, — Aie —Ag e —Àu 
E: 
—À. AZt SA Àg --- — Aor 
a x r2 id ii sk (1) 
=A — Ake =à o AREA 
rtk | 


be denoted by A; 

(ii) Kp .... ky be the latent roots} of the matrix A and K the diagonal matrix with 
Ky, ..., Kp as elements; and 

(iii) the latent row vector matrix of A be denoted by O. 

Then in statistical equilibrium the probability generating function of N,,, and N; is 
given by 


TI, 1. (Z5 Zeer) = exp (m'(z, - 1) + m'(z,,, — 1) - m'(Z, - I) P(T) (z., - 1) (2) 


where the meaning of the symbols in (2) is as follows. The column vector m is defined by 
vw’ = m'A, p being the column vector with ji. ses Hy as elements; (Zp— 1) (T= tT) is 
the vector with zp „,—1 (r = 1,2, ..., k) as elements. zp , being the auxiliary variable corre 
sponding to Ny ,; Z, — I is the diagonal matrix with the elements z,,—1 (r = l...« k); and 
P(r)20-1e-E Q. 

The proof of result (2), on the same lines as in Bartlett (1949), is given in Appendix I. 
The marginal distribution of N, is, from (2), 


Iz) exp (m'(z,— 1)). (3) 


that is, that of k independent Poisson variates. As the process {N,} is Markovian. the tW° 
point probability distribution (2) of N, is sufficient to specify the process fully. 


1:3. The kinematic hypothesis. Nothing has been assumed about the form of the A's and 
the j/'s as yet. The following assumptions about them are now made, the assumptions being 
in agreement with the collision frequency considerations in the kinetic theorv of gases: 
The proof of their agreement (cf. Rothschild, 1953q) is given in Appendix 2 l 

Let A, be the area of the region R, B,, be the length of the part of its boundary 
which it shares with R, (where rs) and L, the ‘free’ boundary of R, (which it share? 
with £*). For some values of r, L, may be zero; X; B,,+L, will adips be the perimeter 
of R,. p 


The kinematic hypothesis then asserts that 


(4) 


and (iii) j, = Mop, Lys 


where the p, are to be so chosen that the conditions MA'A = yu 


' ; bein£ 
à is satisfied, ¢ be? 
the mean speed of the particles, mọ the mean number of particles per unit area an 
A’ the vector (4,,..., Ap). It is easy to verify that the Prs thus chosen are all equ? 


to e[m. 


T It is assumed that corresponding to a rple latent r i 
i s f root, there exist 7 li i 
vectors, whatever matrix A is. This is true in the particular c. Em! pare 


ati 

ate! 
t g 
: : ase consi in $1: re 
symmetric matrix. apa Tis EI, ae 


is 
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M, p raged i Late aro otras to two, namely, c and mg. Thus the combina- 
sson-Markoff process with the kinematic hypothesis leads to a two-parameter 
stochastic process, which has in fact been used in the study of the density fluctuations of 
spermatozoa and which we wish to examine here. (6 and ài, may be any two parameters 
and not necessarily the mean speed and the mean density of the particles, A being replaced 
by some other relevant vector and the discussion below still holds.) 
—Markoff process to be investigated. First, is 
the Poisson-Markoff process when supposed to be valid for all modes of division ofthe space, 
consistent in the sense of being free from contradictions? Tf it is not, then evidently no type 
of stochastic motion can lead to it. Secondly, does the Poisson-Markoff process, irrespective 
of its theoretical appropriateness, givea' good fit" to the data when the method of dividing 
up the space has been agreed upon beforehand and fixed? The second aspect is dealt with 
in relation to some data of spermatozoa counts in Part II of this paper. Regarding the first 
part, it is shown below that the postulation of the Poisson-Markoff process valid for all 
modes of division of the space is not consistent. As & simple case, consider the Poisson- 


Markoff model set up for N, in three contiguous regions R» R, and R, By averaging over 
the region Ry, the ‘marginal’ process concerning R, and R, alone can be derived. Also just 
ave been considered at first and the Poisson-Markoff process for 


set up. Then the ‘consistency’ principle would 
int distribution of N, ı and N, , should lead to the 
ld be generalized to all the possible arrays of 
Poisson-Markoff process for N, in 


e commonly met with in practice, 


1-4. There are two aspects of this Poisson 


these two regions could h 
N, in these two regions could have been 
require that the two ways of finding the jo 
same answer. The consistency principle cou 
regions and selections from them. Here we consider the 
à two-dimensional network of congruent rectangles, to b 
and illustrate how the consistency principle breaks down. 

1-5. We set up the Poisson-Markoff process with the kinematic hypothesis for N, in 
this two-dimensional network of k’ rows and kcolumns as shown in the accompanying figure 
and then consider the marginal process {Nj} in one of the k x k’ regions of the network: 


columns : * i 


| 


|] | 


rows | — orraa | 


K 


se A and the correlation matrix P(r) 
id < ntion-rate matrix Aan 
tic interaction ra 


We first specify the stochas 
or this case, 


for this network of k x k’ regions. F 
(1) A, = A, say, for all r. 

B, when R, anc 

(2) B, =} Bp’, when R, and By 


rs 
0, otherwise. 


ous regions in the same row, 


1 R, are contigu 
y column. 


are in the same 


| vat i 
(3) gU Bp) = p, for every region. 
rs 


Obviously p= 2(1 4p) 
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Let Bc/Am be denoted by A. Then 


Ar —pal, (0) (0) oo Zr (0) 
— p' Al, A; —p'rl, (0) ~ - (0) 
0 —p'Al, A. —pAlM .. T (0) 
AsAg- 0 0 -pA De we 0 (0) , (9 
: —p' rl, 
0 0 0 0 —p'al, Ay 


where A; is the k x k matrix 


p-1 0 0. . 0 

-1 p -l 0 .. . 0 

0 -1 p- RA. Ge 0 

A| 0 0 =l Di as oes 0 
—1 


and I, is the unit k x k matrix. 
The matrix A,,, may also be expressed as 


Arre = A(x ) Iy Li) Jis, 


where the symbol ( x ) denotes the outer product, I,, the unit k’ x k' matrix and J j 18 the 
k' x k' matrix 


0 -1 0 O sa g 0 
—1 0 —1 0 0 
0 -i1 O =T a s 0 
0 


Juspad 0 0-1 0 


3 : E -EE- : =l 
0 0 0 0 0 -1 0, 
For obtaining the latent roots and latent vectors of Aj, we observe that 
(1) the latent roots of A, are 
K, = pA—2cosw, (r= 1,2,..., k), 
where w, = r7/(k+1); 
(2) its latent row vector v; corresponding to x, is 


, 2 j ; . 
Re J (sin kw,, sin (k — 1) w, ..., sin €9,) (Montroll, 1947); 
(3) the latent roots of J, are 


l- = —2Ap'cosw}, (r = 1,2, aes b, 


, 


rm 
where wp = FID 
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: (4) the latent vectors of J, are the same as those of A,,. We denote the latent root matrices 
of A, and J). by K, and Ly respectively and their latent vector matrices by ©, and O; 
respectively. Then the latent root matrix and the latent vector matrix of A;;. are respec- 


tively given by n 
TY Kj. = Ka( x ) Iy Lx is 
6 
0,4, = 6,( X ) Ox- m 
ud Opr Apy Ope = 9,(x) OA CO T 1,00 Tie] OKC) Of 


= [0,4,0;] (x ) [Or I Ok] + [01,0] (x )[0,.J,.6;.] 
= K,(x) Ip I(x ) Les 
on our observing that Oj, = 07.( x ) Oi and that as Ay, Jj; are symmetric matrices ©, and 
O,. may be taken to be orthogonal matrices. 
As the regions are equal in area, the m’s are equal and the matrix 
P(r)2 0-1 e= 0=0' e" 0, 


orrelation matrix between N, and N,,.,. 


9 being an orthogonal matrix, reduces to the c 
auto-correlation coefficient for the region 


Making useof (6) it can be easily shown that (i) the 
In the 7th column and /'th row is 


2 { k F 
pr DÄ = by ir 2 lw, c cos “| 
Fiet) eae iur k+1 [E] [si : -l 


2 [E 
- p» sin? l'w. Cd 7 
spalat r] ? (7) 
and (ii) the cross-correlation coefficient pertaining to the region in the 4th column and the 


lith row and the region in the lth column and the /;th row is 


2 (E £ 
-—p? €t inl. sin l, c, ] e247 cos a) 
But) 5 €T Xr p [sin /, »,] [sin a] 
E E H + ^ "AT ^ 
x vui | Y [sino] [sin lo] e? a " (8) 
iu r=1 


The two-point probability distribution (2) for N, in the network of kx k' regions is thus 
Completely specified, and thus the process {Nj}, being Markovian, may be fully specified. 
rom this multivariate process, the process {N} pertaining to tho regia tho - Serm 
and l'th row may be obtained asa m arginal process. In particular, the two-point probability 
Stribution corresponding to this process is 
—qpaem(us 1) my (7) i713) Gu — 1). (9) 
at the distribution (9) should be invariant to changes 
on (7) for Py wt) shows that this is not the 
complicated situations would be expected 


I, tar (2 2142) = eXP {M2 


The consistency principle requires th : 
inZ, V, kand k’, An examination of the expressi 
on Similarly, the consideration of other more 
0 pa to the same conclusion. bod be verified for simpler cases, say, for 
or result (9) € ^ É 
pie ed LI d esr AC Poisson-Markoff model for region R, alone is 
k=k'=1, lst = -ai 

and qs considered). = = 1 (i.e. the Poisson model for region R, is obtained as a 
slain d the Poisson-Markoff model for regions R, and Ra). 
ae. 


an easily 


marginal mo 
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Then P(t) 2 e-?*, in case (i), (7a) 
= e-P^ eoshAr, in case (ii). (15) 
Since A+ 0. e-PAr + e-P^* cosh Ar. 


Hence the inconsistency of the Poisson-Markoff model is evident. 

It follows, therefore, that there does not exist any stochastic motion that leads to the 
Poisson-Markoff process for N, valid for all modes of division of the space. H owever, iig 
kinematic hypothesis ensures, that for small 7, the Poisson-Markoff process is consistent 
to O(r). Also in certain situations at least, the Poisson-Markoff process might be an adequate 
approximation of a consistent process for N, governing any particular data of density 
fluctuations. This contention involves the question of goodness of fit and is examined in 
Part II of this paper, in relation to some actual experimental data. 


Parr II 

2-1. Goodness of fit tests. The multivariate Poisson-MarkofT model for N, for large 
mean number density, degenerates into a normal Markoff model. Hence it can be repre 
sented asymptotically as a multivariate autoregressive model of the first order and the 
asymptotic x?-goodness of fit test based on serial correlations (the G-test or the H -test) 

(Bartlett & Rajalakshman, 1953; Quenouille, 1947) can then be applied. 
Thus denoting the conditional expectation of N, for given Nyat time ty, t> ty by E(N;| No): 
we have from (2), § (1-2), 1) 

E(N,—m|N,) = P(t—1) (N,— m), ( 
à linear regression formula. Also 

9 
EUN,- m) (N,,, — m)'] = MP(7), (2) 
M being the diagonal matrix (m,, Me, ...,mj). Therefore, the model may be specified by 


the stochastic equations 3) 
(N,— m) — P(t- tg)’ (No-m) = EN C 


where the probability distribution of e, , may be written down from that of the process {Ni 
In particular, when t, = t— 7, 7 being fixed and t takes the values a + rr (r= 0, +, 42, e) 
E(e«)-(0) and Z(eej)-(0) (tt). 

As img 2, the Poisson-Markoff model tends to a normal Markoff model and the variate? 
€ tend to be normal and to be independently distributed of e, (t +). In this case, tP? 
asymptotic H-test or the G-test can be used to assess the goodness of fit of the model e 
to any data consisting of a large number of observations. For finite Mo, €, and e, (tF Lal 
not distributed independently, as an examination of the joint distribution of e, anc er 
would show. The effect of this dependence on the G-test or the H-test can, however: f 
assessed. This is done in § 2-2. In § 2-3 is given the summary 
of fit test applied to bull spermatozoa data. l 


$ 
of the asymptotic y2-goodne? 


2-2. The effect of the dependence of the residuals e, for finite my on the G-Iest, We consid? 
the one-variate case first. From equation (3) we have 


(N,— m) - p(N, ,— m) T > 


; ^ ; " : d 
the interval of time between consecutive observations being taken as the unit time 9" 


ps P(1) = €^, where the probability of one particle leaving R in time dt is Adt + o(dt)- 
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2 1 
This can be written as =6 
HX, 
a 


where H, 
= 1—pE>). E- A 
XEN t pEj', Ep! being th i 
pudicus Shane BEIDE he displacement operator i 
Yen since H(X,,,,X,) = p "m, it follows "^m HIE AIC quei 


Bleu X) 20 
l pu) = u>0 
where ( : and E(e.u657) = M =p?) (u —w 
S(u—w) =0, (uu); =1, (u—w) ' ia 
Let D,= H0, where €, = ly xk | 
i X, Xi. Then 
D n-t a 
t mat à Xu Sen 
BDE X X 
- 1) n P X, 8) =0 (t e 0)| 
Furt} ; E(H,D)) = E(D-pPa)~° (t> »J " 
her it can be shown that 
Bug E 1-0?) , mlp) 
pa ae e à 


He 
nce from the result (7) 


E(H,D,H, D,) = EID- pP) 0 —pD,-) 
= E(D, D,) -plUDD, .) - pEG D.) + EQ, 1D.) 
— t-l 


0 (t +7) 
Tm prp iip 
E» =0jo | "o A s gen 3 
1/Cy, then ^ 
0 (tT). 


m(n =i) wet 
is, therefore, only to raise the variance of 


asymptotically uncorrelated when f +7 
m and hence tends to zero, 


EBn Hèr) ~ ia - pre ,0- js 
tn ( p*p 0-A} (t T). 


(cf. equation (14) 
are still 
of H7, is less than 1/ 


The y 
in paas of the residuals €, 
he kon > 0 here. while 37, and Hir, 
88 it must Sors increase in the variance 
Workin , when m — d 
regions on the same lines for the 
9f the Q's get results similar to thos 
increased Lars linear combinations of tl 
inti uide simultaneous covarial 
“Variants il less than 1/7p- This indica 
Similarly isson-Markoff model for N, in general. 
and Fd , the effect of the dependence of the residuals on the H-test can be determined 
1 to be negligible for large Mo: 
o spermato 


arkoff model for N, in two contiguous equal 
ariate case, namely, only the variances 
ation coefficients for the G-test) iib 
are altered, both the proportionate 
ar results may be expected for the 


Poisson-M. 
e in the one-v: 
»e serial correl 
1ces of the Q's 
tes that simil 


The data consist of the series of 


zoa data. 
5h rectangle, observed 


2:3 
- Res j 
Servo y ults of the G-test applied t 
d number of spermatozoa in 20 equal squares} forming a 4h x 
e rectangles 63-4 x 67-84. 


f Actually the twenty subregions ar 
Biom. 44 
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at regular intervals of time, in film 186 of Lord Rothschild's data of bull spermatozoa. (For 
detailed description of the data see Rothschild (19535).) For the purpose of the test, the 
twenty squares are divided into four groups of five squares each forming four rectangles in 
a line and the observations in the five squares of each of the groups are pooled. The interval 
of time between consecutive observations is taken as the unit of time. 


For this case, i2 i 0 0 


=] 12 - 0 
ASÀ P -r od i 
0 Q -J 2 
Equation (3) of $2-1, viz. 
(N,- m) - P(1) (N, ,— m) = e. (10) 
where e, is for large m, a normal residual vector distributed independently of €p and 
P(1) = Oje-*: 0,, where ©, and K, are as defined in § 1-2, is transformed to 
A es A, = my, p 
where @,(N,—m) = A, and Ne = Oye). 1, and yy (tt) are now asymptotically indepen- 
dently distributed. Let the vector A, have A, B, C, and D, as its elements. The correlation 
coefficients between A), A,,, and A, B,,, are respectively denoted by PpPaalT), Papl?) and 
similar notations are used for the other auto- and cross-correlations. It can easily be seen 
that the cross-correlations are zero and PaalT) = er. The corresponding serial auto- à 
cross-correlations are denoted by r , ,(7), 7 4j (T), etc., and the expressions used for them 2° 
e.g. cs 
X (a,-a) (b, — b)/(n =F) 


t=1 
E (a, — a)*[m. 2 ( -hyn] 
where a, and b, are the observed values corresponding to 4, and B, respectively, 
a= X ajn and b- b bin. 
For the data, n = 100. The X*'s for the G-test are l 


[C 1 ) = Tm(0) Pmml 1))? 
land ml (1— oa 
and yj- > [e —im( = D (PaL) + pu(1)) ri (1— 2) pi(1)p, (1)? 

landm (1. — pi(1)) (1 — p2,,,(1))/n 7 “| ge 1), 
where the summations are over J and m, land m becomi 

VE We , ming A, B, C and D. 

The observed mean densities for the four rectangles are 32.17 b 31:35, 36:37, th? 
interval of time between consecutive counts is 0-4sec. and h~ 65-5% ibd i 


The values of y? and x? are given in the first row of Table 1, the 
being ? 


"A 
xi 


E 
estimate used for p 2° 

r = [20 var (A. 4)}/Z{1]var(a Pw a3 
where the summation is over A, B, C and D, 


À4 = (r44)) 


Sls de SEN EL odi od - 
and s3 $.— 20080, 8g = ^ — 20080, sg? = 3$— 2 cos o, 


pein? 
equal to r7/5. It may be noted that 84 =AlK, 8g = Ak», ete. 


8p! = 1$ — 2 0894, Yr 
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oe — values of p S also used to calculate y? and y and in rows 2 and 3 in 
itn toris a giv vs 9 values of x? and Xx for those values of p for which x? -+ x3 appeared to 
usce m (For individual correlations see Table 4 given in Appendix 3.) The y?'s are 
Eh'y significant, indicating a departure of the data from the four-variate Poisson- 
Markoff model. 


Table 1 
Í | 
Film 186 x | XS | xS 
— -— A € D 
m 
P = r = 0-9208 as obtained from equation (12) 76 | 310 | 386 
D.F. 15 16 | 31 
P= d for which entry in col. 3 appears 105 71 | 176 
= 0-84 to be minimum 94 81 | 175 
15 16 | 31 


D.F. 


The G-test in the one-variate case. The data used for the one-variate case are the five in- 
dependent series of observations of spermatozoa counts in squares (film 191, shot 1, shot 3, 
Shot 4, shot 5, shot 6 of Lord Rothschild's data of bull spermatozoa). The G-test is applied 
to each of the series separately. m is estimated by the sample mean and p, either by (i)r, 


the serial correlation coefficient, or by (ii) 


n 
Y (m= M%1)"/(n-1 


=2 
ral-' -—— — á—— 


n 
2 Y n/n 
t=1 


It should be noticed that the variances of r and 7’ are 
1 2 1-2) 
var (Lm (1—p t M 


1(3+p)(1-P} PA-P) 
and Tad +— 
var(r’)~> i+ ——— mn 
(Lindley, 1964; Ruben & Rotheohild, 1955). Hence when p>2—1, var(r') < var(r) and 
therefore 1’ is a more efficient estimate of p than r. 


The values of 


" 6 ,Iü—2»t 
v= 5 xs X an | 
1-2 t=2 

9). When r is used as the estimate of p, the 
re of the data from the Poisson-Markoff 
timate of p. This should not be the case if 
errors of both the estimates r and 7’ are 
hesis is not valid, whether the 
timate used. To examine this 
(C,/n), which is asymptotically 


given in Table 2 for the two cases (1) and ( 
X"-test does not indicate a significant departu 


Model, whereas it does so when 7” is used as the es 
since the 


the Poj dae 
oisson-Markoff hypothesis is true, 
(n9. On the other hand, if the Poisson-Markoff hypot 
depend upon the es 

ariance/mean ratio, 


Screpancy is brought out or not may 
Westion we may test whether the sample v: 
] cases the m’s are fairly large and hence the effect 


of a In both the uniregional and the multiregiona! 
nite m on the tests has been neglected. 
4-2 
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normally distributed, differs significantly from unity, the theoretical v alue assumed in the 
use of r’. On the hypothesis of the Poisson-Markoff model 


Q^ 
#(2)~1-) lp 
nl—p 


J (13) 
C ES | 
0 
var[- Lied 
and (2); ser J ] 
Table 2 
Film 191 Shot 1 Shot 3 | Shot 4 Shot 5 Shot 6 Total 
= " | in | 
Estimate of p =r 0-5426 | 0-3755 0-5606 0:4038 0-4313 3] 
x 41 L6 Il 41 | $985 142 
Estimate of p = 7’ | 0-7644 | 0-6373 06596 | 06943 | — 07714 " 
m 18-7 6-4 19 | 162 39-1 823 
D.F. 5 5 5 | 5 5 25 
-— 
p 7 ndn = ni i ] "i zx: E 
m=sample mean 24-9911 16-0446 | 15-8850 12-8889 | 20:7719 
n=no. of observations 112 | 112 | 113 117 114 
7 interval of time 0-2279 0-2288 | 0:2250 02205 02248 
between consecutive 
observations in sec. 
Table 3 
— = 
Film 191 | Shot 1 | Shot 3 | Shot 4 Shot 5 Shoto | 45C le 
C ex ape i -06 | -L8 —1:9 -33 -62 
| "M m | m odi 
Cz A.M. of the C's re the five series. pee 3 O//TE(C — Oy: /4]. 


This follows from 


E(n) =m, var (F) ~” t +e), 
n =p, 


(Cy) ~m ee var ( Cie ltp? ml+p 
nl—p n |—-p nl-p 


and using the method of statistical differentials to obtain E(Cy[5) and var (Cfi). Table é 
gives the values of 


C= Cyn — E(C cy (a 1 ipis i) C Lud 
~var ( On n nl—» nl—r2 
i jt 
for the five series. The variate /5C is asymptotically normal with mean zero and m 
variance, while ¢,. is a t-variate with 4d.f. Both are here significantly large bringing à 
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a significant deviation of the data from the one-variate Poisson-Markoff model with 
particular reference to the variance mean ratio. 

The Poisson-Markoff model, therefore, does not appear to give a good fit either to the 
multiregional or to the uniregional bull spermatozoa data considered here. In the uni- 
regional case the evidence for the departure of the data from the Poisson-Markoff model is 
not as striking as in the multiregional case, since the value of the pooled x”s for the five 
B M is not significantly large when the serial correlation coefficient is used as an estimate 
of p.t 

Since the Poisson-Markoff model is consistent only over small total intervals of time, it 
the stochastic motion fully and then to base the stochastic models 


is desirable to specify 
m contradictions. The stochastic models 


for N, on it so that these models would be free fro 
for N, are, in general, not Markovian (Bartlett, 1954). However, the closeness of approxi- 


mation of the Poisson-Markoff model to such consistent models may then be studied, and 
the Poisson-Markoff model, which is comparatively simple, can be used as an approximate 
model in place of the original one, if the closeness of the approximation is found adequate 
for the particular situation under consideration. An investigation of some specific models 
for N, based on fully specified stochastic motion has been carried out and it is hoped to 


publish an account of this work in due course. 


Bartlett for his valuable guidance. I am indebted 
atozoa data available to me; to Lord Rothschild 
allowing me to read the draft of their papers 
ed suggestions regarding the first 


Iam extremely grateful to Prof. M. S. 
to Lord Rothschild for making the sperm 
and Dr H. Ruben and to Mr D. V. Lindley for 
before their publication. I thank the referees for detail 


and second parts. 
APPENDIX 1 


Proof of formula (2), $r3 
from one state to the others of the process {N,} are 


of transition 
for N, 


The infinitesimal probabilities à 
Here we have written nr 


expressed in tabular form below. 


Transition from state at Paesi dan 
time ¢ to state at time t4 dt the tr 
|, dt + o(dt) 
n, >n, +l Mr 
n,-n,—1 n, At dt-k o(dt) 
n>n +l n, A, dt 4 o(dt) 
n, m,—1 
t 
all other transitions from o(dt) 
one state to a different state P 
* — 3 
Hence no change 1- Xad- 207 + EM n, dt — o(dt). 


Then the partial differential equation for Ie Ha. the p.g.f. of Ne obtained as in Bartlett (1949) by 

taking into consideration all these possible transitions is 
all ' el ss (2,2) MU 7 20- a) 
“et B Eade V+ Bes, UE ` 

re quoted by Prof. Bartlett in his 


as made first, We 
other results had not then been 


ich w: 
test, whic E 274); the 


T "The results for this particular oe 
Processes (1955, 


book An Introduction to Stochastic 
obtained. 
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The auxiliary equations of this are 


dt dz, 4l 


1 XAQG-2)-Afü0-2) Sae 
sr r 


With the matrix A as given in § 1-2, let the vector M’ = (mj, ms, ...,72,) be defined by 
Uu - m'A. (3) 


(It should be noted that ja | =k,ky...K,+0; see line 4 from bottom of page.) 
For any linearly transformed variate «’(z— 1), equations (2) give 


dll — da'(z — 1) 


> -— (4) 
Ilu(z—1) —a’A(z—1) 
dt —da'(z —1) (5) 
ang 1 -eA(z-1) 
First take « in (4) as identical with m, and then, with the use of (3), we get 
dll dm'(z—1) (6) 


I 1 
so that II e-?'*-5 = K,, where K, is any arbitrary constant. Next take a’ as identical with nj, the latent 
row vector of A corresponding to the latent root «,, and equation (5) becomes 
dt — —dni(z—1) 
1^ —kmz-1) 
whence n;(z — 1) e-*r' = constant. 
The general solution of equation (1) is therefore 


(r = 1,...,k), (7) 


IL(z) e-*'*-» = y(ni(z—1)e78, ns(z—1) e-r, ni(z—1) etit), (8) 


where W is an arbitrary function, the form of which may be determined from the initial conditions. If 
initially the numbers of particles in the regions R,, 


ee à x seep Rg are ny ..., My respectively, then taking the 
initial instant to correspond to £ = 0, 

k 

I1,(z) = I[ zz. 

r=1 

Hence, putting ¢ = 0 in (8), 
k 
i zpre™E-d = Y(ni(z— 1), ... niz — 1), m 


Now with O defined as in $ 1-2 let the vector y be defined by 


O(z—1)-y or equivalently z—1 = O-y (10) 


and let &;, the row vector of 9-1 corresponding to (z,—1). Then equation (9) can be written as 


k 
Vno «+09 Yu) = e7 From TT (14 Etyy ar, an 
i ry) 
With this expression for y, equation (8) becomes ý 


II -mi e ét Sabi S ; 
4z|m)e exp - LE mlE nie — 1)e ?h T (+ Ez 1) emet] 


k 

= exp(— [m'0-1 e-Ktg(z 1) II [1C — 1], a2 

where G; is the rth row vector of the matrix m 
P(t)20-1e-Xto, 

It can be shown by expanding A in terms of A*'s in the diagonal el s thod of 

mathematical induction that all the principal minor: Son an Sraployane thani? 


soni S of A are itive- i e late? 
roots of A are positive (cf. Bartlett, 1949). Therefore when ¢ LU definite. Hence all th 


. €7**--0, for all s, 
and (12) gives immediately 


IL, (z |n) = exe-», a» 
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This probability distribution is i 
y distribution is independent of the initial v; i.e. vi 
LE gei mesi ium, de n he initial value of N, i.e. value of N; at t = 0. Hence 
F Il, (2) = ee, i 
r? 
rom formulae (12) and (14), expression (2) § 1-2 for the p.g.f. of N, and N,,, may be obtained. 
+T à 
APPENDIX 2 


m A ; " "e 
T'he proof that in two dimensions the collision frequency is mqc|m; also the proof of formulae (4) $1:3. ' 


Let c be the v NC i 3 icle i $ i 
e vector » representing the velocity of a particle in two dimensions, and f(c), the prob- 


ider the number N of particles passing in time d? from one side of 


Ol Li 
ability density for the velocity. Cons 
to the other. Then it is easily seen that 


an infinitesimal element of length ds 
N= Í mdsc,dtf(c) dc, (1) 
cy20 

ie gs the y-axis is taken along the normal to ds, the positive direction pointing towards that side of ds 
m Which the particles move. On transforming to polar coordinates c and 0 given by c, = ccos6, 
€, = csin 0, and integrating over 0, equation (1) becomes 

dsdt [° Q 

N= mes f cf(c) dc = Mot asdt. 
7 0 7 


Hence the number of particles crossing a unit length in unit time is given by 
MC à 
E (2) 
"n lC 
Therefore m,P,s(dt) = = B, st. 
But with the assumption of & uniform mean density Mo, 
m, = mår (3) 
Hona NES 
lence P,,(dt) = ma (4) 
Sim; cL, 
imilarly P*(dt) = —,- dt. (5) 
7A, 


an number of particles coming into a region 


equilibrium the me 
ber of particles going out of R, to R* 


Noting that in a state of macroscopic 
e dt balance the mean num 


R, from R* and the other regions in tim : b 
and the other regions in the same time, and using the equations (3), we obtain the relation 
uw = mA. (6) 
APPENDIX 3 
Tables of serial and cross-correlations 
Table 4. Values of r(T) for T = 0, 1, 2, for the series At (defined in § 2:3) 

Film 186 To ra ro Film 186 To Ta Ta # 
o vum | evan | own | tee | iss | tas "0447 
TAB 2 —0-5124 | — 0-4939 TCB 2 

— 0:5220 1-0000 0:8914 0:8461 
"40 0.2688 | — 03259 E ae 0-2682 0-2617 0:2864 
"aD — 0-4906 — 0:4987 —04 D -0 

7 " — 0-4906 0-5356 — 0:5269 
TBA E — 0-4846 — 0-5877 TDA E ; K 
TBB bee 0-8265 0:7316 "DB ee pee erm 
Bc 0-4445 0-4486 04509: | rpe 1-0000 0-8725 0-8102 
Trp 0-5538 0-5801 0:5453 TDD 
I — 
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Table 5. Values of r,, for T = 1,2,...,6, for the five series of observations 


(Film 191, shot 1, shot 3, shot 4, shot 5, shot 6 of Rothschild’s data) 


| | n 
Ti fa T3 | fi & | fs 
| H — 
á - =" i 
Shot 1 | 0:5426 0:3109 0:1976 0:0588 0:0560 i= 0-0608 
Shot 3 0:3755 0-1013 0-0784 | 0-1050 0-0258 | 0:0208 
Shot 4 0:5606 0-3280 0-2026 | 01806 | 0:1235 0:1096 
Shot 5 0-4038 0-2055 — 0-0041 — 0-1248 — 0:1107 — 0:0043 
Shot 6 0:4313 0:1976 0-1617 — 0:0033 | — 0:0520 0-0024 
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TESTING FOR SERIAL CORRELATION IN LEAST 
SQUARES REGRESSION 


By E. J. HANNAN 


Australian National University, Canberra, A.C.T. 


1, INTRODUCTION 


We sh: : : 
e shall be concerned with a regression of the form 


Yı tu Ta E751 f €1 
Ya |=|% Vee Tko fz +4 & (1). 
Yn Vin Von c] Thn Pr i En 

- yeXp-ee, 


where the c, are generated by pendent of the vectors in X, 
and have correlation matrix T. - 
In a fundamental paper Anderson (1948) considered the case where the c, were generated 
by certain Gaussian processes giving à joint distribution near to that for a stationary simple 
Markoff process. For the cases he considered, when the column vectors of X are latent 
Vectors of T, a uniformly most powerful (one-sided) testof the hypothesisof serial indepen- 


dence of the c, may be obtained from the ratioof à quadratic form in the residuals tothe sum 
of Squares of the residuals. The matrix of the quadratic form in the numerator also has 
the vectors in X as latent vectors and is close to the matrix occurring in the definition of the 


Serial correlation coefficient His results suggest the use of some form of the serial corre- 
lation coefficient as a test statistic in the general case when the regressor vectors are no 


longer latent vectors of T. Durbin & Watson (1959, 1951) use, for example, the statistic 


a stationary process, wholly inde 


E Z AgZ 
iE 
Where z-|l -X(XX) xy = QY 
dy Sl 
E 
ar a re . (3) 
ie ^ 9 -1 
VENE 


agonal, and the diagonals immediately above 


The elements of A, outside the principal di 


ar E 
nd below it, are zeros. wast re 
Durbin & Watson reduced d to the canonica 
n—k 
Eyi 
1 


bese" 
DES 


1 
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where the v; are the latent roots of QA, (other than E zeros) and the £; are independent 
standard normal variates with zero mean when the ç are independent normal variates. 
'They showed, moreover, that, ordering the v; with respect to size 


Aj v; € Àj is 


where the A; are the latent roots of A, (also ordered with respect to size) other than s which 
correspond to regressor vectors which are latent vectors of A,. 
They then tabulated, for various n and k, the significance points of 
n—k n-k 
ZAG EAaag 


ox (3) 


and d,- ——& 


n—-k " 
x SF 2i 
L 1 


which correspond to the case where s — 1, and the only latent vector is that vector composed 
entirely of units, which corresponds to the constant term in the regression. These significance 
points then provide bounds to the significance point for d. | 

In general the v; depend upon the regressor vectors, the only simple case being, again; 
that where the regressors are latent vectors of Ay. In cases where the bounds test is in- 
conalusive they recommend the use of }d as a Beta variate with the appropriate mean and 
variance. 

An alternative test was suggested in Hannan (1955), but, while this test is asymptotically 
fully efficient, when n is small and & large relative to n end-effects reduce its power very 
greatly. In these circumstances the bounds test is also unsatisfactory however, since the 
bounds are very far apart. For example, with n = 30 and k = 6 the bounds e 1-07 and 
1:83, although the range of variation of d is only from 0 to 4 (approximately) l 

As has already been seen, the latent vector case is of primary importance in connexion 
with the test for serial independence of the residuals. It is also of great importance in the 
estimation of B, for when X is composed of latent vectors of T it is well E that the 


. straightforward least squares (r.s.) estimates are, numerically, the same as the best linea” 


unbiased (8.L,U.) estimates (obtained when T is known). Grenander (1954) and Grenander 
& Rosenblatt (1954) have investigated the conditions under which the t ] lifferin£ 
estimation procedures give, asymptotically, estimates having the same ed "i ER radi 
no matter what the stationary process generating the e, may be. Their is P which. 
will be given in more detail in the next section, are met by cease OE B dee s lytic 
functions (for equidistant values of the argument), including the o d E dre) 
polynomials, the trigonometric functions and functions of the form s Mario dp 


analogy with the latent vector case su 
c i the suggests that when Grena "st D 
applies the distribution theory of statistics such asd can b i. pror eis aiit 


m esimplified and this is investigate 

Most attention will be paid to the orthogonal polynomials 
nometric functions has been dealt with exactly by Anderson. 
cases where Grenander & Rosenblatt/s theorem holds seem 
mentioned that much of what is contained in $3( 
is already implicit in Durbin & Watson's work. ( 
pp. 171 and 172.) 


Inthe final section ofthe paper the problem of testi 
f ar esting for serial ioni iduals 
from a regression which includes trend terms as well Z Mc sert ien. a iren 


Since the regression on trigo" 
& Anderson (1950) and othe? 
less interesting. It should be 
relating to the orthogonal polynomials 

See, for example, Durbin & Watson (1951): 


E 
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pA y i Di 
2. REGRESSORS WHOSE SPECTRAL DISTRIBUTION FUNCTION IS A STEP FUNCTION 


Gr E 5 
2 rise ar (1954) and Grenander & Rosenblatt (1954) consider the case of regression on k 
ctors (of n elements each), $y, Pa ..., Pr generated by a process such that 


(i) lim X 92) =0 (w= 1 esse s 


In--t-1 


Gi) lim £z£——— — 1. (w= Ls E), 
(4) 


SQ ERDEAD 
(i) lim f —  — j 
j 


à = r4, P); exists (j,v = l, ..., k). 
[S eo oto 
t=1 t=1 


The (t) are considered merely as known sequences of real numbers, no other restrictions 


o : à 
n the process generating them being made. 
Grenander & Rosenblatt then show that 


R, = (r0) = | eam) " 


Where M(0) = [m,,,(0)] is a matrix of functions with M(0) - M(0j) (Hermitian) non- 
negative definite for 0,20, It is presumed that Rọ = M(z) - M(-7) is non-singular 
(to avoid a kind of asymptotio multicollinearity). M(0) is called the spectral distribution 
function of the $, Since the ¢,(t) are real we shall have dM(9) = dM(-—0). Introducing 
the (matrix) fandtioh N(0) = MO +)—-M(-9-) (which is real and sy mmetric) the regres- 
Sion spectrum S is defined as the set of points in 0 to 7 for which dN(0) is not the null matrix. 
Tt is then shown by Grenander & Rosen at the necessary and sufficient condition 


ü blatt tha 
il hi i 
hat the x.s. and n.r.v. estimates 0 ally, the same covariance 


f B should have, asymptotic i 
matrix is that S should consist of g «^ f increase 04, Og, ..., Ôa for which 


points o 


dN(0,) Nr) dN(O;) = null matrix (i+ » | m 1 $ 
= dN(0;) (i =j). 
I we now put T(0;) = Na)? dN(2» 
m 

" L Na | dN(O) = L 

? shall have ETO) = N(7) fa (0) 
“nd it fol the T(0;) are idempotent. i Y 

Christe ged ub ies p ed is proved under certain p me e on = 
Spectral functi ting the €. In particular, the spectral density mus 

t ss generating M : li 

not have pear a aS f;. An example where mite ï : NY conditios 
ats . 2 wq is given in Grenander P. . 
en Teas a qoas serial correlation in the residuals (when P3 " 


To apply this result to the problem 


o 
^ the null hypothesis the 6, are independent normal v 


Si se s 
der the case where the test statistic 18 Z/ AZ 
t asai ZZ , jf is. 
i E. - 
d q m 


ariates with zero mean) let us,eóns <, 
ae e " 
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It is easy to see that Tr T(0;) is invariant under the orthonormalization so that this 
formula is perfectly general. In fact under the orthonormalization the T(/;) = N(z)-1 dN(0j) 
will be replaced by P'GN(0;) P, where 


P'N(z)P = I. 


It can be easily seen that the P'dN(0,)P annihilate each other and are idempotent. 
They are also symmetric and non-negative definite, however, so that each must have 
units in certain places in the principal diagonal and zeros elsewhere, no two having unity 


in the same place. (This fact could be used to obtain an alternative proof of (10).) It follows 
therefore that 


Tr P'aN(0;) P = Tr N(z)- dN(0;) = Tr T(0;) = P; (p; integral), 


where Lp; = k. 


1 


We then have, under the conditions of Grenander & Rosenblatt’s theorem, 


lim (n Tr (QA)} = n- m As— » 2,90,» 
n 1 


J 
1 i " a 27 
= or INT, d0 — Xp, oF 
from (8), and it can be seen that, asymptotically, the distribution of » is the same as that 
which would obtain if the regression vectors had be 
roots g(0;), repeated pj 
latent roots of 4. 
For example, if 


en latent vectors corresponding to latent 
times. As n increases, of course, g(0;) will be arbitrarily near to Pi 


2n9;t " 
aj a(t) = n sin "un. Xyj(t) = le cos d 


(j = 1,....9), 


2mg; 

then R, has cos (h 24 , repeated twice, in the principal diagonal and zeros elsewhere 8° 

that dT (4) is null except at 0 = 27g,/n, where it has two units in the principal diagonal and 

zeros elsewhere, If as A we choose » the matrix occurring in the definition of the circula? 
serial correlation coefficient; (see Anderson & Anderson, 1950), we obtain 

lim iTr (oA) = 1 (pe As & 2(cos me^ 

ns (n n| T ^W i n ]p 


since g(9) now becomes cos 0. For suitable g; the statistic r= AE could then be refe 


rred 


WE 
lim E Tr (QA) =a m A£— (k4-1) X 2(00s "wn 
no N 1 n > 


and this does not correspond to any statistic whose significance points are at present 
tabulated. 
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3. REGRESSION ON THE ORTHOGONAL POLYNOMIALS 
» e we shall use the matrix Ag. Though A, does not satisfy (7) it is very near to a matrix 
za does and the difference will not matter asymptotically. We now have, in the limit. 
4 : = 2(1— cos). For the orthogonal polynomials there is only one point of increase in 
(0), at the origin, for which T(0) is the unit matrix so that 


Tr [QA,] -> Tr A$-—kg(0) = Tr A$. 


Since d is necessarily less than d, (see (3)) and the roots Ay, Ag; +++) Aya all tend to zero as 
n.—- co, it appears that the effect of the regression is to eliminate the X smallest roots from the 
Spectrum of A, and the appropriate significance point is, asymptotically, that given by the 
upper bound to the significance point as tabulated by Durbin & Watson (1951). (Note 
that in their tables they put &' = k—1.) 

E. check on the adequacy of the approximation for n small the mean x and variance 
9? of d and d, have been calculated for certain n and k' (= k— 1) and are shown in Table 1. 


Table 1 
N 
i 1 3 8 
u kt gs ht s h on 
ON NE El 

ta — 0.225 2:474 0-194 2-798 0-145 

1 E Men 2-491 0-190 2-860 0-137 

d. 2.110 0-177 2-345 0-164 2-578 0-147 

20 i Me 0-177 2-354 0-162 2-620 0-137 

Fy odo 0-146 2271 0:139 2-456 0-130 

= fa 2.086 0-146 2:275 0-138 2-481 0-124 

"d S 1 0-124 2-222 0:120 2.376 0-114 

30 f 2-07 Mr 9.924 0:119 2.392 0-111 

| dy 2-071 | 

It appears that the true significance point will not differ very greatly from the significance 
Point ford. E - n= 15,k =5 the deviation will be only about 0-06. (Here 
the 5 9 ba raw in à e ime eed "Having regard to the certainty of deviations from 
no 9 DOUNAS ary (DU NC eee as ill rarely be made when the observed value 

rmality and the fact that critical decisions W1 e 
alls v. he fact tha nt, it seems that the upper bound to the significance 
ery near to the significance point, s be used as an adequate approxi- 


arly alway: 
d value is very near to this upper 


ately by using 1d as a 


Point as t bin & Watson may ne 
abulated by Durbin & We en the observe 


Mation t alo -- 
o the true significance point. did 

ound the significance point could always be located more 2 
Sta variate, as suggested by Durbin & Watson 
_The work of Durbin & Watsons d, of course: 


the significance point 
1 What we have here 
unds differ from 


that the bounds to 
of the order 7^ 


owed, à 
tities 
mials the upper bo 


iffered fi :enificance point by quan 
Yom the true significance P orthogonal polyno 


no is to sl i f the 
à how that in the case 0 i aA 
? true significance points by quantities of the order ” 
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4. MIXED REGRESSION ON ORTHOGONAL POLYNOMIALS AND RANDOM VARIATES 


A situation which is likely to arise in practice is that in which a regression is to bé computed 
and all the variates contain a trend component. A standard procedure is then to include iin 
orthogonal polynomials in the regression up to the order needed to eliminate the prs 
complicated trend. This will often result in a regression on a large number of regrosnom, E 
that the bounds test for the serial correlation of the residuals will be inconclusive. For an 
example see Quenouille (1952, pp. 182-6). . 

Consider the case of a regression on k vectors œ; satisfying the conditions of Grenander’s 
theorem and p vectors x; which are unrestricted. We may, without loss of generality. take 
the whole set d; x; as orthonormal. Then if the non-zero latent roots of 


D k n k 
[r-Sxxi- Sees - [1- Sx] [i- 5| A 
M 1 1 b 1 L 1 d 
are 


k 
while those of |i -X6$; &;| A are 
L J 


Hi: Ha s P Lp 


Vj Vos sss Vas 


Durbin & Watson (1950) show that, when each set is numbered so that its members are 
increasing, 


Vi It Vi. 


Z'Áz. 
Thus a lower bound to r — —— is 
UZ 


D ws 
q= EG * an 
x 
1 
We have from (10) 
|, n=k-p a p-1 
lim X »=TrAs- > {g(0;)} Tr T(0,)— 23) P 
n-o i-l 1 j= i 
n 5 q p-1 
= BA BOTT) X 
j-0 


But from the fact that Ài S Vi S Àj, 
it is clear that Y, ill di $5 i 
it is clear tha P differ from D An; by a quantity of order n- so that to our ord? 
of approximation to the moments of r, we may put 


n—k-p : n-p q 
58. s 
È i= X AÈ oye TI), 
In the case where Tr T(6) is the appropriate multiplicity of the root corresponding M 
g(0;) a lower bound to the significance point for r may therefore be obtained from 
XA 
n= ANG : 
Yu 
where in X’ the terms correspondin 


g to the latent roots other th 
corresponding to the g(;) appear. 


ge 
àn the p greatest and th? 
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For t 
he case of the orthogonal polynomials (using Ag) this bound becomes 


n—k-p 
2 
21 AG+k-19F 
f= 
“i 77 ==" ean b 
um (13) 
A Gj 
1 


(the la asi ' 
tent root, zero, corresponding to the constant term having been omitted, to accord 


wit bi ; 
h Durbin & Watson's notation, as in (3)). 
n—k n—k 


" 
Thisi y 
818 not a true lower ini i wev 

. ^ 3 "ex wv 
er bound, for finite n since (for example) - P" : Aj; However, 


oeil at the previous section indicate that the error involved in using it will 
iene. = A n h " eover, one can only be wrong 1n using it as a lower bound when the p 
"This hagla d to the latent veotors corresponding to the largest latent roots of Ag. 
lisfisne Hie nam ema, high negative — and would probably be recognized 
the sienio ysis was ipecnp tt However, this scussion is largely academic, since 
n E ferm 'e points for d; in ( 12) are not tabulated. Until such time as the necessary work 

an expedient (which will narrow the bounds, though not as much as d; will 


Narrow . 
Arrow them) is to use 


1 


di = "a-p-1 j 
3 k 

1 
the terms corresponding to the (k— 1) smallest 
ector corresponding to the constant term). 
actice this will provide a lower bound. 
y Durbin & Watson (see (3)) and can 


ais 

1 Pd Obtained from (12) by adding back 
or mat (the th vector in the set being the v 
for q^ s ee of n, p and k occurring m pr: 
6 i he significance points for dy are tabulated b 
We deca from their tables for d; putting P = K. l| à 
tier 8 all close this section with an example which originally suggestec the problem. In 
on Meus (1952, p. 183) an example is given of the regression of Us, fertilizer consumption 
wl NE index of farm income over the years 1911-47. The two series contain a decided trend 
"Hah appears to need a fifth Jenie polynomial for its removal. In this case, therefore, 
T | the four statistics d,, dy, d; 


Dc], ; : 
and d k—1 = k = 5 and n = 37. In Table 2 the 5% points for 


« are shown for these p, k and n. 


Table 2. 5% points 


dı di dí du 
1-14 1-42 1:75 1-88 
a Beta distribution with the appropriate 


d by using 
t differ from th 
of the residuals 


urbin & Watson 
bound d; 
d given by d; 


e exact values by more than 0-02. 

as 0:229, which corresponds 
's original bounds no con- 
(obtained from Durbin & 
it can unquestionably 


ees significance points were obtaine 

nd variance but will certainly no 

o sie quotes the first serial correlation 

Usion ue of d slightly less than 1-54. Using D pes 
atso may be reached. Using the conservative 
n's tables) this is still so. However: using the boun 

be so- 


* For in «$ this will certainly 
Biom. 44 
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be said that the serial correlation is significant at the 5 % point (as Quenouille conjectured). 


In this case the random regressor is positively serially correlated and there can be no doubt 
of the validity of the result based on d;. 


I should like to thank Dr G. S. Watson for some helpful discussion in connexion with this 
work. 
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ON THE ANALYSIS OF MULTIPLE REGRESSION IN k CATEGORIES 


Bv S. KULLBACK axp H. M. ROSENBLATT* 
The George Washington University and Bureau of Ordnance, U.S. Navy Dept. 


SUMMARY. Tho general model, an information theoretic ap 
of hypotheses concerning sets of partial regre: 


e of stochastic depen: 


of each category reduces to a special case. The cast 
ultivariate linear hypothesis. 


included by Kullback (1956) in his discussion of the m 


l. INTRODUCTION 


We first consider the problem of the normal linear hypothesis (Kolodziejezyk, 1935) from 
the viewpoint of information theory. and then apply the results and procedure to the pro- 
blem of tests of hypotheses about sets of partial regression coefficients. It is believed that 


this approach has some merit, at least from the pedagogical point of view, and has served 
of Experiments at the George Washington 


as the basis for teaching some aspects of Design ds Á 
University. Although the general theory given has wider applicability, the specific results 
given here may serve some needs for k category multip 
Carter’s (1949) paper, the k category multiple regression case does not seem to be covered 
1n the literature, and has not been found in recent texts on statistical theory or method 
tef. Kempthorne, 1952; Kendall, 1946; Pearson & Wilks, 1933; Rao, 1952; Welch, 1935; 
Williams, 1953). Study of the problem by Rosenblatt (1953) was stimulated by the needs 
9f an applied problem at the Naval Ordnance Laboratory, White Oak, Md. Complete 


results and a general theory are presented here. Matrix notation and theory are used and 
the results are illustrated by application to certain numerical data. Matrices will be denoted 


" 1D) nn MM = Liat 
by upper case bold face type, e-g- A = (24). = (2,4), ete. (i = 1,25 um, e he "ia 
e vectors or 1 row or column matrices wi ype;e.g. 


Il be denoted by lower case 
x’ = ( 


le regression analysis. Aside from 


V, Cos ey a), a = p Hae» cass lap): OH a" " 
Let fi(2) and f(x) be the probability densities of 
l hypotheses H, and Hs, respectively. The mean in- 


ll. Information theory approach. 
1 with probability density f, (x) for discrimina- 


s, ulations specified by the statistica 
Ormation per observation from the populatior 


ti 
9n for H, against H, is defined byt 
-) lo hi) da, (1:1) 
10:2) = | fl?) 108 5 a) 

1 i betw: H, and H,, 
Where « can be taken as a vector variable (multivariate). The ven Cena Oa 

n 1 

* Measure of the difficulty of discriminating between them, is ce i 
e a2) 

a f(e)]log 275 02- 
J(1,2)-— fino f) Md] 
t, Washington, D.C. 

rch, U.S. Na Dei Statisticians held at the University 


* 
Now wi 
ith the Office of Naval Reset inar 0: 
of Chi resented 24 March 1955 at a Bureau of Ordnance Semi 
icago 
d neral p 
See Kullback & Leibler (1951) for the case oF 8° " 
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i r) are p-vari "mi ensities, with the same 
For the special case when f(x) and f;(x) are p-variate normal densiti 
matrix of variances and covariances, Z, and mean values 


Ey() = i = (anti -e ap) and Ex) = Us = (Mor Mag. se fL): 
it is found (Kullback, 1952) that (1-1) and (1-2) yield 


21(1:2) = J(1,2) = (y — ta)’ E (p, — Be), Ded 


where the right-hand side of (1-3) is proportional to Mahalanobis's generalized distance 
(see, for example, Rao, 1952). 


1-2. The linear hypothesis—single category. We now consider the set of n observations 
on p+1 variates z - y- Xp. (1-4) 


where @ = Cp eeu) Vx (Yrs Yas csin) 


B’ = (Pi Bo, esf] X-(v) (i= 1,2, eo 72 1,2,...,5; 9 <n), 
such that 


(a) the z's are independent, normally distributed v. 
ance g?, 

(b) the a,,’s are considered to be known, 

(c) X is of rank p. . 

(d) B = B1 and P = B? are parameter matrices whose values are respectively given by 
the hypotheses H, and Hy, 

(e) Z,(y) = XBi and Ey) = XB. 

It is found that (1:3) yields for this case, 


" i aii 
ariables with zero means and vat 


J(1, 2) = (XB1— Xgoy (2*1) (Xi — XG?) P 
= (P1 — B) S(B1— B2)/02, pe" 
where S — X'X isa p x p matrix of rank p and I is the n xn identity matrix. 
Suppose H, imposes no restriction on B and H, provides som 

P = B*. We estimate J (1,2) by replacing the parameters by 
appropriate to the hypotheses. That this procedure is b, 


between a null hypothesis and the alternative hypothesis, by using that distribution corre 


sponding to the alternative hypothesis which for the sample values provides the leas D 
information for discrimination against the null 


B H n 
hypothesis—ig shown and discussed en 
generally by one of us (Kullback, 1956). The classical least-squares procedure of minimiz! 


z'z- (y’—B’X’) (y— Xg) ) 
S = X'y ge 

whose solutions are minimum variance u 

ziejczyk, 1935; Plackett, 1949), so that 


ae 3 ue 
e specific hypothetical ior 2 
the best unbiased estima ni 
Fm n toan 
ased on the principle of discriminat”? 


leads to the normal equations 


: ; d- 
nbiased estimates (cf. Kempthorne, 1952; Kolo 


^ aa x o a 1) 
J(1,2) = (& — 82)’ Sb — 8315». a 

(n—p)6? = 2/7 = (y= 6X’) (y— X61) 

In particular, for the common null hypothesi 


where -yy-fh Sp. 
S, Hy: B = ga = 0, (1:7) becomes á 
J(1,2) = &^ sarge, g 
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It is ak TB s - a H 
i c - n hice in regression theory that the components of " as linear functions of 
1985) are normally distributed with covariance matrix given by o?S-! (cf. Kempthorne, 

952), 

_ Under the null hypothesis, that B = B2 = 0, J(1,3) in (1-8) is therefore Hotelling's 
Generalized Student Ratio (Wilks, 1943, p. 238), or 

J(1,2) = pF, (1-9) 
where X has the analysis of variance distribution with n, = p and n, = n— p degrees of 
freedom. These results may also be summarized in the usual analysis of variance table 
as given in Table 1-1, where 
A SBI = BX'y = y'XS?X'y. 
For the more general case, corresponding to the hypothesis, H: 8 = B?+ 0, (1-9) still holds 
with J(1, 2) given by (1-7). 
Table 1-1 


Sum of squares 


Variation due to D.F. | 
Anca: — 857(01.9 
Linear regression p [3 Spi 0*J(.2) a 
Difference n-p yy-B^Sgi-(n—p)o 
‘Total n 
EE 
CATEGORY * 


2. SuB-HYPOTHESES, SINGLE 


into two groups, which we will denote as B, and 


Suppose we partition the parameters B 
2 80 that in place of (1-4) we now consider " 


z-2y-(X, XJ I2 


Vip Eig ^ Vig © Vip 


) and X= (XLX) = 


Cna ` Tnq eS Ug 


are still assumed to be independent 
variance o2, and corresponding 


Where B= (^ 
B. 


J 


- Um 
—q. The 2's 


wi ! 
vith X, and X, respectively of ranks q and p 
4 E and common 


a : E 
ag normally distributed with means zero 


to H, and Hs. E,ly) = nia "T 
E«(y) = X, B12 + Xs? 
ie [Su Se (2:3) 
lt also follows that S-XX- (s; $5. 


i NME-NEP OC 
Where NE cM ; 22 


" ;ze the results here as a preliminary 
* For an alternative treatment see Grundy (1951). We summarize n 
ernative A 


aid for : 
for the discussion in $4. 
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For this case, (1-5) becomes 


noge [Su Siz) (B-B) J 3 24 
70,2) = (Be — Bis, pa-Be (S (As eia) / 7 " 


The normal equations of (1-6) under H, become 


(bs Si (5) - (B); (25) 
Sa Soo/ AB Xj 
or $1,862+S,.6,1 = Xiy.) (2:6) 
Sa 8: S, 8,1 = Xiy.] 
a, &uíSu Su (By 
-nr-vs- eB. Se) (o), 
and (n— p) Y'y — (Bis, Bo) Sa. Soo) V. 
Let 85, = $5- $4 SR! S, X21 = Xi-S4S5!X1, 
then (2:6) yields bs = Sa XL y, (27) 
^ ^ 2:8 
By = Ss! Xiy - SiS, 8, (28) 


It is useful to note (see, for example, Frazer, Duncan & Collar, 1938, para. 4-9) that 


Satie ea E = Ka M ) 
S5 Soo M’ Sa i 
where the q x (p—q) matrix 


M = -SF Spe Szt, = = Site Si Syl, 


so that in the applications, the elements of the matrix Si! or S3} are already obtained 
once the matrix S- is obtained. : 


3 
Suppose now that in particular we want to test the null hypothesis Ay: B = B2 = (5 ) 


that is, B,? — 0, with no restrictions on 8,2, while under the alternative hypothesi? 


1 1 
Hy: B = Bi= P with no restrictions on the parameters. Again we estimate J(1.2) by 
2 


replacing the parameters by the best unbiased estimates appropriate to the hypothese? 
Under H, we have the previous results for 6,1 


dl, Qa and 8?, Under H, the normal equatio? 
of (1-6) now yield 


Bs = S; xiy. c) 
From (2-4), (2-8) and (2-9) we have that 
20,2) = Coss Bp ( Sn Se) (7 is Pa) Bases. (U^ 
It may be readily verified that 
XS=X' = X, Sa, X, eXSix: gi 
that is, à^sb = à s, Gu NIS gar? 
or 


oen. Haus 0m 3) 
&^X'y = Bax; y e Biexry, (233 
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"e Yum results may be summarized in the following analysis of variance (Table 
n : (1,2) = 81S... 8.1/6? = (p—4) F, where F has the analysis of variance distribution 
4 = p—qand n, = n — p degrees of freedom, under the null hypothesis H, that B.? = 0. 


Table 2-1 
| 
Variation due to D.F. Sum of squares 
| Hy: B’2 = (Bis, 07) q E 8i s, B. K 
| Dif p-4 85S... Bt = 670,2) 
BE TOU MS 
H,: ^ = (Bi, Bs) p , bash 
Diff. n-p y'y -&aS& = (n-p) 6 
Total n yy 
Í = 


CLASSIFICATION, k CATEGORIES* 
vations on (y, y, ..., t5) for which the 


3. ANALYSIS OF REGRESSION—ONE-WAY 


Suppose we have k categories each with n; obser 
Beneral linear regression for each category is 

ec ttg t Pop (3-1) 
tions for category j, 7 = 1,2,....p 
nt, normally distributed with zero 


zj = Yu- Bara 

where j= 1,2,..., k categories, ? = 1,2, ...,n; observa 

Independent variables (p «n;), the Zj; are independer 
Means and common variance c?, and the 2j, are known. 

The linear regressions for each category can be written as 


Z, = y;— Xj Bp (3:2) 
Where for j = 1,2,...,k j j 
Z; = (Zr 2a —Ó yj = Un Vie ending 
X; = (Xj Xj n Xjp)> Xj = (pip: jar mm 
and 8; = (Bix Bias ws Bip): i l 
We may write the k sets of regression equations (3-2) for k categories b cc 
z= y-X8, e 
X, 9 
by defining x- >. , B'- (Bi Ba Pi 
0 Xi. 


y- (yi; Y» en YR 
eter matrix of all kp regression 
e a particular value including 


a! = (zi Zi, «o Ze 
^ + (3* ram 
By the preceding definitions we consider B in (3 sint * hav 
"efficiente Bj, whether or not any of them are equa’ 
ero, according to any hypothesis. 
=1, see Kendall 


* For the case p= 


(1946) and Welch (1935). 
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coi E IEEE E) 
Suppose, however, we specify a null hypothesis with regard to certain groups or sets z 
the e yarameters Pir among the k categories, and wish to estimate the parameters and ve 
1 zi s decr ater 
the hypothesis against some alternative. To distinguish between matrices or E 
vectors under various hypotheses H, (x = 1,2,...) we will use, where desirable for i 
or emphasis, the notation Xz, B and S* = X'zX«. Where this notation is not used th 
i i i ; SN "or any 
applicable hypothesis and definition of the matrices will be clear from the context. For ü sl 
hypothesis, H,, we will represent the linear regressions for the k categories combin 
under H, as TA (34) 
, cdi 
where z and y are specified as in (3:3); however, we now define B= as the matrix of UR 
regression coefficients specified by hypothesis /7,. and Xe, the meini of tji with pen 
regression effects, specified according to the regression model defined by the hypoth 
for the k categories combined. - 
With the representation (3-4) of the I: category regression under H,, the normal equatio 
(1:6) become "up "m 
Be = Se-1X'zy, 
where the elements of Se = X'zX« will, of cour 
the matrix Xe, 
Also, equivalent to (1-7) 


and 
; :Gontion of 
se, depend on the particular specification 


we have, for a null hypothesis M, and an alternative H, 
Ju, 2)- (gi = Be)! ch = B2)/62 = (B: Sii à S? 82)/62, (3:0) 
(NV —pk)6* = y'y- ĝisi ĝi, | 

N= +%5+... n 


r . ts 
and S = X'X = Si for X defined in (3:3). Thus, for any particular hypothesis on the sot 
of regression coefficients in I: category 


: e 

regression, the estimates for the coefficients and > 
test of the hypothesis are readily obtained solely by proper specification of the matric® 
X* and Bz in (3-4). 


Consider the two hypotheses, 


where (3-7) 


H: By =P, (G=1,2, uk r= 1,2, ...,), (3°8) 
i.e. the //;, are different for all categories and for each r = 1,2, .... p; and the null hypothes? 


HM: By =f, (joi, esk; r= 1,2, ..., m); (39) 
B= P= uf s esp) (13, say li); 


i.e. the regression coefficients are the same for the different categories for each r = 1, 2 sn 
Corresponding to H, in (3-8) the best unbiased estimate of B is derived from (3:5), wher? 
for the hypothesis H,, B and Xe in ( 


or equivalently, 


2 e 
3-4), defining the k category regression model, ave 
same as B and X in (3:3), or ^ 
S, 0 B, Xiy, ) 
k^ HN E (a (3°10 
0 S, Br Xp, 
that is, b sets of normal equations 
^ j ; PR 
SB; = Xjy, ( —1,2,...15, (3 
from which G 


Ê, = Sixy, 
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Corres ing in (3-9 g the 
ponding to H, in (3:0 ). however, the matrices X2 and B2 of (3-4), defining the k 
2:4), 2e A 


cat ry ressi 
ategory regression model. are 


Sa = (Xi. X. — er ps ag (us dao 


Ther k k 
a efore S2 ^a Na ^ k 
, 2- X2X = YXX = SS i 
XX 5 XjX;= XS, X2y = X X]y, 
j=l jul = 


an : . 
d the best unbiased estimate of under H, is derived from (3:5) as 


62 = S: Xy. (3:12) 
We also have, under H, corresponding to (3-7) 

Ad P k 
(N — pk)8? = y'y— B1S1B1 = PRAE B; S;B;)- (3:13) 

F= 

Corres ; 
orresponding to (3-6) we therefore have 

8*J(1,2) = B 1iS11— B 28282 = 2; BIS; j— Q2 S: 2. (3:14) 

j= 


ation of Kendall's S; (1946, $ 24-30). 
f variance Table 3-1. AL 2) = p(k— 1) F, 
(k—1) and (N — pk) degrees of 


E- latter result is a direct generaliz 

vs ek therefore summarize in the analysis o i 

freedom nas the analysis of variance distribution with p 
when the null hypothesis H, of (3:9) is true. 


Table 3-1 


| 
| Sum of squares | 


Variance due to | D.F. 
H,: Q2 = 6.2 p p s 
Diff. plk-1) 618181 — 
SSS 
| | a eA A 

| H,sg-p pk | Basi anp SB; 
| | | a i a 
| Diff. | N-pk | yy — 818181 = (N — pk) o° 
E eJ " ) 

Total N=Xn; y 


ASSIFICATION, Ll CATEGORIES 


Eg-WAY CL 
the matrix Bj. 


4. SuB-HYPOTHESIS—ON 


4. 
l. Two-partition sub-hypothesis. Le 
..., k, into two parts 


Bia) and Pj= (Biz? Bip) 
q< p, 8° that B; = (Bir: Be) 


į us partition the parameters of 


9r each category j = 1,5 
Bi = (Biv of 


of 
7 and p—q parameters respectively, 
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Consider a sub-hypothesis, H,, which states that for j = 1,2,..., k, the f; are different 
forr = 1, 2, ...,q, while for r = q+ 1, q-- 2, ..., p, there is a common value B.» for the //;.; thus 


(41) 
By =Br G-—L2,.,b;r-gqrlqt2,.. 
or equivalently Ay: Ba = Bh (Fes) 

Bis = B's = (Fus Bp): 


Let H, remain as in (3-8), that is, the £,, are different for all j and r. Then under H, we have 
the same matrix definitions and results as in $3. However, for H, as given in (4:1), the 
matrices X2 and Q2 for the k category regression model are 


p2 = (Bi B2) X2= (X. X), 
where. Bi = (Bi: Bor s Bia), Bs = (Pav Pens +B») = Ble 


Xu 0 Xi 
and X, = ^ , x-[ : i 
0 Xia Xy 


Xj S (X5 Xi ss Xjq)> Xp» = (Xjgua> Xjq42) 22/2 


Hy: Bj = Pir (7 = 1,2, «06,43 r2 1.2,....q), \ 
22 


Xj, = (Es Ljop s. Lin ir) (j212,..,15 r— 1,2, 9). 
Thus under H, S2 = X’2X2 = x) (Xx m ‘all 
Xj 21 Soo 
where Sa= (3, Sa-XUX, Sy—Si,—KX, So=— XX, 


Sin 0 Sus 
Sy — E , »-[ i Saa (S ort cae Sina) 
0 Sii Sia 


(12 
^ 
end where Sii = Xi X5, S = Xj Xo = Sha Si, = XjoXjo 


From the normal equations (3-5) we now obtain 


Su, + S6, = 2 (42) 
Soi Bi + S, 8, =X 
so that as in § 2 we have 8, = SZ X) y, (4°3) 


k í 
where S5; = (S22 — Sa SI! S42) = p i (Sj — Sj S518) e Š OT 
j=1 ' 


X31 = (Xi, i, +++) Xho) and XPiy- b ia, iy; 


Also as in (2-8), and from the definition of matrices ndm H, we have 


a s . 0 Xu 0 Yi S 
Bin 0 MIB i MI e| 


M 


i Xi : p Sus ai 4) 
E = i s d 
Sai Xi Ve Sai Smb a 


ll 
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Thus under H, of (4-1), we have the following estimates of the regression coefficients: 
Ba = SAl(Xy;—Sp28.2) = Bn (j = 1,2, ...k), 
x k -1 k me (4:5) 
B.: = (= Shaa) È Xeay;- Bos 
j=1 j=l 
mls & = (5587.9 = Be. 


Ifunder H, of (3:8) we also define 8; = (Bj: Bj.) but rearrange and partition the sub-matrices 
9f B and X so that 


B’ = (BiB) X = (X. X). 
iius Pi = (Bin Ba. EL Bia B = (Bs Bias sex gas 
By = (Bio. Bag: +++ Bro)» Bie = UTE Piare nip); 


Xu 0 P f 3 
X, = w š X, = " , 
Xa 0 x 
3 


0 


Sus 0 
S 0 12 ; 
th " m1 ' E 
"S X= Sy = | ) xix -S-| S a 
0 Sia k12 
Sion 0 Sis 
ES 
0 Size 
MEC Sra GH LS vk). 


and : n 
Sin = XX, Sy = Xa Xp = 9m vt 
i ^ re of $2, 

E then obtain the same estimate of B; (J = 1> ais), as iu $5, Ey the proceduto NET 


lat is, fro 
, m 3:5 
(3:5) we have 6, = xw) (4:6) 


6, + S22 à. -X 
Sar Bi E (4-7) 
= Sg (Xi y - Su B2 oe 


Where 7 | EE MM 0 ) 
Soo. = (Sa — 8n Sq S12) : ay 
Sj) (j= 1, 2, ... E). 
Xia 0 ) 
aissi, tu 
going 


S = (Sjo2— Sin Sai 


j22.1 7 


ay. (j=1L2: 
2.1 = X - S Sat Xa G=> 
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so that from (4-7) we obtain under H, for each category j = 1.2,.... k, 


[^ ; à à 9 
Ba = SAX aY; Saeb) = By. (+9) 
Bro = Sih. Xje.1¥; = Bel. (410) 
and Ê; = (Bi. Bie) = By. 


With these estimates of the parameters under H, of (3-8) and H, of (4:1) and noting after 
some reduction that 


e S:g* = y'X, Sy! Xiy 8: Ss CN. (x = 1.2), (+11) 
we obtain 


A a, ^ ^ 
t 


eI (1, 2)= Bo! Soo 11 Bol Ld B:2 Sa 


where for computational convenience we may write 


k 
A A A’ +A DU 13 
Bs! Soo Ba = BX) py = X Biot Xo. y, (413) 
ja 
82S, 2s = BX; ay. (#14) 
We may therefore summarize in the analysis of variance Table 4-1, where 


J(1,2) = (p—q) (k—1) F, 
and P has the analysis of variance distribution with (p—4q)(k—1) 


and N — pk degrees of 
freedom when the null hypothesis H, of (4-1) is true. 


"Table 4-1 
— — — c lil 

| 

| Variation | 

due to | D.F. Sum of squares 
| 
Hy: Bat, Big? | qk--p—q 

| 


B':S:Q: 


^ ^ ^ ^ LA ^ A ^ EATP 
(p—q) (k— 1) BaS1g: - BSB: = y Bie? Sisi Bur =P: Sa., Ba = gju?) 
j=1 a 


| = |S — — 
| 
| 


Hy: Bun Biat | pk | Bası 
Diff. N -pk (N- pk) 8? = y’y—Arsip 
IL g J " — ae 
| Total | N=ntn t... tny ^ 
| y y m i 


42. Three-partition sub-hypothesis. If the s 


" "TS he 
‘ : ub-hypothesis requires partitioning t 
matrices X and B into three submatrices, so th 


at 


8 -(B.8.8) and X-( 


^ X, > Xs, X4), 
we obtain from SB = X'y the solutions 


8. = Sii X$ asy, ) 
A n P 
&-SzuXy-Ss hy, L, qr 
Bi = Su'(Xiy - S, — S.) 


1 
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Su S1 Sis 


where S-|S, Se Ss); Su=Xi% t= L2 h 
Sa Sa Sas 
and -— — —— Sa 
Bao o ae Sy SH Sis: Shr = Sg Sy Sa = Shaw 
Soo, = Ss - Sá Sn’ Sie: Xiay = (Xi-Su Si Xi) Y 


X3.2Y = (X3.1— Saz. S21 3.1) Y y= (Xi- S4 Si! X) y- 


Also (compare with (4-11) 

à'Sà = yX,Su'Xiy* y X, 4 Sua Xiayt Bs S. i ês- (416) 
obtain other useful forms of equation (4-16). e.g. 
(4:16-1) 


Usi x 
Sing (4-15) and collecting terms we 
a’s@ = yX se Xy + &iXcay + Bea 


ata are raw observations and tjn = 1 for all j and i, so that 


Which is . 
ch is convenient when the d 
-.., and Jj and obtain deviations about average 


the firs À 
val rst partition serves to include the tj; 2 
ales > basi ap 
es for basically a two-partition problem, and 
wo A ^ Rach 
Q'sà = ĝi Xiy+ BEXSv* Bs Xsy 
are deviations about average 


(4:16-2) 


for 
three-partition problem where the variables already 


Values, 
n 
Phe " " F : . ; 
ef € above discussion, if required, can readily be extended by induction to any number 
Partitions. 
43. ( , : velati 
ao - Carter’s regression case. Carter (1949) considers the case d bis — effect 
: ng the ith observations (i = 1,2;-«*) in each of / samples. His regression model can 
Written as " 
7 ei. cis 4:17 
£g = Yt Y fiic 7 9n AF 
r=1 
Wh à 
ere t ; is common to the ith 
obse the correlation effect among samples 15 due to a;, an element commo 
TV, k 
k. 
ticular ca: 


ation in each sample, j = L2: i 
ysis where 


t can be seen that this model is a par se of the sub-hypothesis anal 


the x , 
Xand B matrices are a’ = (8i 8) X= (X, Xo) 
"i = (Bi P2) 
d the submatrices are 
MEE opu a aiaia Boc tct 
Bi = (B Bir o Bist)» Bj = (Pv Bi „Bah B2= 
Xu 0 Viu jrg 
e 7 X=: an 
0 Xy Vint dein 


V 
S 
e = 
ga © 
——— 


/I 
x-fi} I 


I 
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where X, is a k x 1 matrix of submatrices I, the identity matrix of order n x n. With these 
definitions of X and 8, Carter's normal equations (his (3-3)) for estimating the //s follow 
directly from the normal equations (4-2) by obtaining 


S; B, = Xi sy. (4:18) 
where Su. = Su- SS Sa S4. 
For this case we obtain 
Ny, ly, l., 
( TE Xi Xy — 7, Xn Xe Be ~~ XuXa 
ly, Ies -- 
Si. = -pEnXa ( a i) XnXs pen Xe 
LV ad c k ; 
ly, L1 ll. 
—pXaXn —-jXhXa gis (- Xj Xa 
and 
1 , 1 ^, 1 " 
(1-4) 2i — 7 Xn ses -jXn 
1 1 1 
—— eG j ees , à NEP 
bpm can pia ( i) Xa oe jin 
NN lis. 1 
—7, Xia -gX —€ (-i)xs 
As before, 


Si-XiX, S,- XX, = Sj, Sa = X;X, 


The estimates of the correlation effects a; are not given Specifically by Carter. The solutio? 

i @=1,2,...,n), (4-19) 
= d d 

where Z% = p» 24 = 


follows directly from ^ " 
S48, = Xy — SB, (4:20) 


5. EXAMPLE 
As an example of the foregoing results, consid. 
product tested under three environmental c 


ed 
er the performance data of a manufactur? ee 
independent variables. In the equation 


onditions (categories) each involving thr? 


25 = Ye Put — 52% 51 — [3554 — fg 
the data yj; and æ; (r = 1,2,3, 4) are raw obse 
andi = 1,2, ...,1;. In thisexample k = 3,p 


(Dn 


ja ji, 


rvations, so that ja = 1 for all j = 1,? 
= 4,n, = 16, Ng 


= l5and n, = 16. The matic? 


S 
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S; and Xiy;(j = 1,2,3) of the computed sums of squares and products about the origin 
are 


16-0 286-8 139-0 4,835-0 97.500 
S, x 286-8 5,340-4 2,452-2 86,849-0 Xy - 1,788,052 
139-0 2,452-2 1,307-0 41,990-0 3 ies 838,010 [^ 
4,835-0  86,849-0 41,990-0 1,465,575:0 29,484,809 
15-0 244-6 236-0 4,625-0 83,470 
S, = 244-6 4,181-6 3,869-0 75,318-0 Xty, = 1,404,814 
^ 236-0 3,869-0 3,824-0 72,500-0 j Ahh 1,320,100 d 
4.625-0 75,318-0 12,500-0 1.427,425-0 25,727,050 
16-0 256-0 97-0 2,995-0 89,280, 
256.0 42217 —1,619:2 47,897-0 1. _ | 1,456,596 
Ss 97.0 1,619-2 785-0 17,840-0 [° sys = 554,650 ] 
2,995-0 47,897-0 17,840-0 580,475-0 16,743,450 
where 
S nj Lo ues EE. 7 
jT (Sj) Sja M Uit (rt 1,2,3,4) and Xjy,— Gua) Se = hte 
i=l 


Ti may be noted in the above that the element sy = "1 5211 = Ng, aNd 8515 = Mg: The multiple 

Tegression equation for all three categories combined is given by (3:4), where it will be 

Temembered that specification of the matrices X* and B* depends on the icy: Wero 

by hypothesis. The data in the above matrices can be suitably arranged for analysis 

according to hypothesis. 

h To illustrate the statistical method seven hypothese 

Ypothesis 77, imposes no restriction on the Js so that 

(r = 2,3,4). 


Hy: By = Bir Bir EH Pir 


heses against Hy: 


s are considered and tested. The 


a other hypotheses are compared as null hypot 


0 9,3); 
Hy: By = Piw By = 9 l ne : 
Hy: By = Pa By =B a " 23 
Ay By = Biv Bir = f eens 


= 2); By = Bx (r = 3,4), 
s; å &= r= 3,4), 
Br = Be (= 3 4). 


heses all apply for j =l, 2, 3. In stating these 
, for convenience, since in this example it 

lues. Table 5:1 presents the 
tests of significance of the 
coefficients of the various 


Hy By = Piw Bir = Bir ( 
H: Pi = Pi: Bir = Bir (r 
Hs [a Bj Bir = B. 


T H 
Bg above statements of the various hypot 


d pe for H, 


Vari 
rious hypotheses. Table 5-2 pre 
on the same lines. 


y Pothe i atr 
ses. The specification of them 
“ble 52: those for the other hypotheses follow 
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Table 5-1. Analysis of variance table for tests of various null hypotheses 
H, (& = 2,3, 4.5, 6,7) against an alternative hypothesis Hy 


Variation dus to D.F Sum of squares F 
—— |- | P 
Hi fach | ke $ | 1,556,805,752 | 
Bip =O (r= 2, 3,4) | | | 
g | 3 
r- pk—k= 9| 24,992,970 87041 967%" 
—— eS — Qd Pi 
1 553,97 
p- 3 
p * - 
css — —| = sak 
Diff. H,, Hy pk-p 8 833,796 2904 3°63 
l 
Hy: Bn= f k 3 
Bir = By (r= 2; 3, 4) p-1 3 
p: ptk-l 6 "- 
——— = . 3.88** 
Diff. Hy, H, (p-1)(k—1) 6 667,568 | 2325 3 
— — — "MIS e a a a) [a — 
As: By — fl k 3| y'X; SiiXiy 1,556,805,752 | 
Bip = ly (7 = 2) bk 3) BX iy 23,923,966 
=f., (r =3, 4) | p-? 2 | Xiiy 493,550 
Bs p+2(k—1) 8 gs: 1,581,223,26 
À I ^ 
Diff. Hy, Hy (0-3)-1) 4 | GS fs Ss = 6271, 5) 575,45: 
ee c NU. 
Hy, =f, " F 
6 poe _— : 1,556,805, 
jr JE M. TRUM c 24,34 fi 
we gc 24,349,074 
Em CHE AE 28 1,581,154,826 
Diff. Hy, He (p-2)k 6 643,896. 
Ello ee tat 
Hy: By — fi L^ 
7 ^R = n Bos k 3 1,556,805,752 
Bir r (r=2) 1 i1 MA AHG 
— f. (r8, ) x i x 2,450,676 a 
wee Eu 3(p-1) 6 22,461,474 
= 461,479 
p 2p+k—1 


Diff. H,, H; 
E—— d | 


Ay: Ba — fn | 


p(k—3)— (k— 1) 


cL Il 


80,815 


m a 


pH 


pae ; k 3) yXSyuXy 1,556,805,752. | 
ir = Big (0 = 2,3, pk—k 9| &x | 
2 pk- - DERE 24,092,970 
i o ZL Basi 1,581,798,722 aA 
-— yn led nS SU ue T ei 
Dif N-»k 35| y'y-QGnSh = (N pay ĝe 1,004,084 EIL 
» $»704 = 
3 Il 
Total N=amtnytn, 47 | y’ 
yy 
| 1,582,803,706 
** Significance at 0-01 probability 


level. 
Using the 0-01 probability level for si 

it is concluded, from Table 5-1, that 
(1) The regression is real; reject Hy. 
(2) One set of regression coefficients, includi 

represent all three categories; reject Hi. 


gnificance, and the 0-05 probability level for cautio?" 


" f 
ng equality of means, cannot adequate! 
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. (3) One set of regression coefticients is not adequate even after allowing for differences 
in mean value for each category; reject H, 

(4) One set of regression coefficients for variables x, and z; for all three categories cannot 
be used; reject Z, further 

(5) the regression coefficients for vs and 2, 

However, 

(6) the use of one regression coefficient for the variable ta 
@, and for the constant term is adequate: accept H. 


cannot be ignored; reject H. 


and different ones for v and 


Table 5:2. Estimates of regression coefficients under various hypotheses 


Hypothesis | Ba | Bia Bis Bis 
| Gaai 
| . " 
Wy: j= 3586 203-4 — 10:69 gr 
j22 — 1186 | 231-1 79-02 25:10 
j=3 1654 227-7 =p 8 
6094 
| 5504 | 
5580 | 
| a 
| —11-19 0-647 
Haij21,2.9 | 2009 
| 3-71 1.28 
1803 3-71 1-28 
1589 371 128 
| 1862 | 
7 | m ee era 
, j 34-64 | 2:43 
H,:jzl 1349 34-64 2-43 
j=2 617 34-64 2-43 
j=3 1625 | | 
S| = MER, a 
— amma (il 
| 2- 
2467 | Hh 
1873 | ET 
2001 a 
m si yl 
vi aes m E m — 4:28 —4-10 
. 9-8 : í 
Hz 3431 | de 19:26 2 e 
— 6767 | Suns —4-15 ZI 
| 1758 219:8 
bci | —— P 


e the matrix of parameters p. 
noted that since we are dealing 
ts Jj, of P and the vector 


in the exampl 

j . Tt will be 

€ matrix of observations X are cum ression coefficien 

aw observations in the example, the reg 1 is provides for the usual 
P X 

jed Tor eT as f deviations about average values to 


ucts O: 
ares and prod Biom. 44 


TF 
and th the hypotheses H, and H; consi 


With = 


OE n = 1, 2,3) were partitior 


Tact 

i n 

Ce of obtaining sums of squ 
6 
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simplify further calculations by reducing by one the rank of matrices S (of sums of squares 
and products) whose inverses must be obtained. 


H: Pn = Bj: Bir = Pir (r = 2,3,4; j = 1,2,3), 
B’ = (BLB) X = (X,, Xo), 
BL = (Fiv Bor: Psi) Bz = (Bio: Bae, B32) Bs T (Bè Bia: Pj) 


Xu . 0 Xu s 0 
Xucm|: GR sk Gub e Be ol 
0 s Xg Ü a X 


2 


Sire CL Deeg ll) X; = (Xj. Xj X54); 
order 1x nj, Xj, = (jp Vjops e Egg); 


Hs [Üa—Ü By =By (r2. Br=Phy (r-3,4j21,2,3) 
B’ = (Bi Ba Ba) X = (X, X, X) 
Bi L3 (Piw Bors P3). Ba = (Ayes Bop, Pas); Ba = (83,2 a). 


R PUE TM EET n. 
2954. Sep = | Xem oe Sey eh Xp = (Kr %q) 
0 . Xy 0 i 34 ja ja Xj4)> 


Xj, and x; (r = 2,3,4), are defined as under Hi. 


In the foregoing example each hypothesis on the parameters applied to all categories 
j= 1,2,3. Tt is clear, however, that this need not be the case fo 


r the theory and method at? 
equally applicable for any assertion of the hy 


potheses with regard to tho parameters. Fo 
example, we might have considered a case where part of the hypothesis concerned equality 
of the parameters for certain of the categories, but not for all, e.g. 


Hy: Pj — B (213; Tz 


1), 
Bir = Bir (G = 25 r= 1), 
By =P.» (j= 1,2,3; 7 = 2), 
By = By (j = 12,3; 7 = 3,4), 


and analysis by the three-partition sub-hypothesis procedure of §4 would apply. 


The authors wish to express their sincere appreciation to the computing staff at the Novel 
Proving Ground, Dahlgren, Virginia, and to Mr Fred Okano for their able assistance ” 
carrying out the computations, 
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BIVARIATE STRUCTURAL RELATION 


By R. L. BROWN 


British Coal Utilization Research Association, Leatherhead, Surrey 


SUMMARY. Given n observations (xj yj), each subject to an error of measurement asce rtained 
independently, the existence of a structural relation for the true points (X, Y) is discussed. For a 
linear relation a ¢-number is defined from which either 
developed. The relation is treated in toto. 
of regularity, which enable confidence and 
provides a solution for the 
and it is suggested that th 
the ¢-number tre: 


a confidence or a structural theory can be 
Structural theory is deseribed in terms of three hypotheses 
fiducial theories to be contrasted. It is shown that the theory 
curvilinear relation wherein all the alternative 
e latter is a necessary requirement, 
atment of the linear relation is deduced from 


hypotheses are incor] rated, 
Tho alternative hypothesis implicit in 
the general theory. 


1. THE PROBLEM 


Givenn measurements (x;,4;) of two variates, each subject to error, and some theoretical bast? 
for a structural relation governing the true points (X; Y;). it is a fundamental problem in inn 
physical and engineering sciences (Brown, 1955) to establish whether or not the structur® 
relation is in accord with the evidence. One simple form of this problem is considered in th? 
present paper; it is supposed that the conduct of the experiment has been judged reliable 
in the context of existing knowledge and that the data are 80 organized that the dist ributio" 
of errors can be ascertained and adopted as a quantitative measure of reliability. The 
problem thus formulated was suggested by an experiment on coal breakage (Brown. 1947) 
Previous work has been limited mainly to the linear relation and ns main objectiv® 
has been to estimate a best relation. When the method of maximum likelihood is used it P 
found that the best relation passes through the mean of the observations; the limits obtain?" 
for the coefficients defining the relation are then conditional on this requirement. Mor 
over, it is usual to find limits for the coefficients Separately; this procedure destroy® a 
eun iral vot aaa ciae sales e 
heats fine pum s iae e S i m np information on the erro! Morte 
arguments; Moran (1956) has moentdy umm ior " E S inis us haem jns y 
well known that in the linear case the ratio of the es d m - its ito i natio! 
: : ; error variances suffices for the estimat!” J 


of a best relation; here the alternative hy 3 i ptt 
c ; a ypotheses are not obvious a it is even dou 
if they can be formulated. 558-30 8 08 


Ww 


2. LINEAR RELATION 
21. 


p-number 
If the data are normalized and the err 


1): 
then the perpendicular 


or* in z is distributed N(0,1 
from an Observation (x, y) on to à line ) 
g1 

Yaa +a,X e y 

Since, as Lindley (1947) it i yti 
true points (X, Y) hii gei " 


N (0: 
) and that in y as N( 


is distributed N(0, Tý: 


. has poi 
anything about the ~ Ponten oub, 


> Part of the difficulty ap 4 is remo V" 
Putting I of the difficulty of the problem is ren j 
nó = y Ctm (27 
i (+a?) 


* This ent E 
his conventional formulation is discussed further in $3.1 
ISS "ther in § 3-1. 
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we know ý 29 4: a " 
w that n9 has a y2-distribution Z(9)dó. Then the numerical* statement 


Op 
Pr (¢ġ < p) = Ld) dd = p. (2:3) 
J0 z 
associates a probabilitv a m 
contains 1 probability p with a chosen ¢,,. Notice that the differential element dø is mixed, 
Ces ing on both the observations and the coefficients of the line. Writing 7S,,,, 28,,, eti. 
sum of squares and products of the observations and taking the origin at the mean, 


we have 
$. Q 2 
tai — 224 Sry + See (2-4) 


1+a¢ 
A confi eue : s 
from nfidence region for the coefficients (2. 2) jointly ean be written down immediately 
equation (2-4). It is the non-central conie, 
o3(S,, — By) — 224 Sry +65 = (9) — Su) (2-5) 
give points lying inside the conic, leading 


Th ans 
€ coefficients (æ), æ) of all lines for which 9 < 9j, 
a proportion p of cases in the 


to t] 
lotic usual assertion that the coefficients lie in this region in 
run, "| NL WA $ ue : 
8 run. The region is bounded if the conic is an ellipse. 


c 2-2. Acceptance conic 
' 
or : MEM : E 
fidence theory is not developed further in this study of structural relations since it 
a physicist who pictures the problem as 


c 
es correspond to the habit of thought of ler 
(a. fined graphs in the space of true points (X,Y). He would regard the coefficients 
Me evan if predicted from a valid mathematical theory (Jeffreys, 1948, pp. 283-5), 
Vie and would wish to translate the Q-number (24) in terms of the probability 
nent (2-3) into a family of lines acceptable at probability level p (cf. §3-1). Such a 
t space is easily effected by 


ransfor à I— s 
dnos o Mation of the probability distribution into true poin 
ing the envelope of lines (2:1) subject to the constraint (2-4) on the coefficients. This 


ives 
es the acceptance conic, us 
(Y-a,XP (m Y+X"" _ ap, (2:6) 
(,— 9) (9s — 9p) 
Where 1 š 2 q2 }} 27 
di—5g- [(Syy See) + ((8,, — Sex) TASTE (27) 
28, 

S q g 2:8 
$i = Sre— 5 Qs = Sy, t @ 15er (ne) 


and it; | o 
si it is assumed that a,» 0, ó, <e Equation (2:6) does not contain the unknown coeffi- 
“ents (a, a ) i i.. s P1 72: 

n KAN 
ne , 
d acceptance conic could be regarded as à tr 

ence conie (2-5), and, as such, it could be given à 
VR by the disappearance of the unknown E 
“ue "Ing the envelope. But the envelope 15 me to be cs doe 
rod. i is determine 
z nt Space having D = dy and the locus 1$ 


2 i „y appears fe 
Yi). Thus an interpretation analogous to fiducial theory ap} ! 
d to numerical measure; cf. the third regularity 


to true point space of the 
foundation. This view is 
in the process of 


ansformation in 
confidence 
efficients (xo. %1) 
arded as the locus of all lines in 
articular observations 


asible. In the author’s 


* 
In 
£g 3+] the sense of indifference to content as oppose 
Thai 
e limi n 82:2. 
limits are the same as those found in § 2-2- 
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view, however, the fiducial method is not applicable since structural coefficients -a 
in kind from population parameters; in § 3-1 it is suggested that the acceptance conic Mer : 
be interpreted in terms of new hypotheses of regularity. Meanwhile. it may be noted t - 
on account of the existence of a ó-number, the alternative interpretations yield the i. i 
limits for (x9, 0). It appears worth while to show that the acceptance conic has satisfactory 
properties. These are: 

(i) the relation has been treated in toto, 

(ii) the central conics for 9, differing are confocal, —€——— 
(iii) for 9, « à, « ġa the conic is an hyperbola and all lines having ¢ < Q9, lie inside t 
hyperbola, which therefore bounds all acceptable relations, -—-— 

(iv) for ¢,>4,, the conic is imaginary and it follows that there is no acceptable linear 
relation, j 

(v) for 4, > pa the conic is an ellipse and every line outside the ellipse has à > 9: her 
it might be said that the line is indeterminate in direction, since the range of the pu 
were not large enough in comparison with the errors; the ellipse is then the limiting boundary 
of the mean of the observations. 


me T d " es 
When the acceptance conic is an hyperbola, the limits for «, are given by the asy mptot 


-— tx (9s ġ )4($,— 91) (2:9) 
MAE Sree 


and for the intercept J at X = h, 


kI = a (ds — d.) + [(1 +44) ($a — $y) (bp — $4) e -- 213-3), e 
k= ($a — $y) -ailp — d). 


These intercepts do not belong to parallel lines. Notice that they depend on one probability 
level p. The axes of the hyperbola are 


where 


-a= nY-X-9, (im 


e 
and these can be called respectively the ‘best’ and ‘worst’ lines through the mean of th 
observations. But the ‘best’ line is not hi 


e ions ere estimated in the usual sense. 
Next, if r is the correlation coefficient of the observations, 


t= (-£) (1-$2)> (6-2) (1-52). (12 


it being intuitive that 4, < 


t H i n 
Sza Syy. This might be regarded as a test; for a unit populati^ 
correlation (p = 1). In the example of $2-3 below, r? — 0.94 and ( -£&) ( -£) = o 


Syy 
at dein et (2-12) makes use of the errors of measurement and provi 
tes erent from that derived from the bivariate normal distribut; : anv 

; al i ca 
be unity. distribution, wherein p 
Lastly, 


a 
es 
d P 


the method may be shown to be consistent. Thus the pairs of sums 
is Sx; Sys (e4 S y +a); Big; (Sx x -- 1); Bus 04S xx; S (Syy tat 1) 

" ER ind e T" 
converge in probability to the same values, provided the e 
observations. Hence the pairs a, 


93; Pr l; py, Syy 
same values. Also the hyperbola tends in ded 


the 
rrors are not correlated t° pe 
xx(1 +a), converge in probability °° 
the limit to the true line Y-oX repeated: 
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oa is defined by the five statistics T, J, ay, Q4. $p and these are unbiased and 
bom n ^ probability. These statistics give a sufficient expression of the data for the 
s ies e posee. through the ó-number, a test for the existence of a linear structural 
bee n. D evertheless, I consider that nọ has a X2-distribution. This question of degrees 

reedom is discussed more fully in the extension of the problem to acceptance quadries 


in m di i ioh TE ish i 
m dimensions which it is hoped to publish in due course. 


2-3. Example 


The following data were prepared with the aid of a table of random normal deviates, the 


errors in æ and in y being taken N (0, 1): 
106 13-4 147 189 


x L8 41 58 TO 93 
y 69 125 200 157 249 23-4 302 356 391 
We find gs 9 z= 957 y= 2314 
n$,,— 238 Sp, = 451 nS,, = 906 
nd, = 11  mó,- 465 a= 198 


Now X$ at p = 0-05 gives np, = 20. Obviously there are acceptable straight lines, the 


acceptance hyperbola being er+zy=5 
(Y —2X)2—0-02(2¥ +X) = 5: 


refer 
ferred to the mean as origin. The limits for the slope are 


a, = 15 to 27, 
ay = + 2:2. 


and for q 
o (at h = 0) 
hich the data were derived, 


ES true line (Y — y) = 2(X 23) - l- from w. does not meet the 
Ceptance hyperbola. (See Fig. 1-) 


2.4, Regression case (c, = 9 TyF 1) 
ee nọ = y (yore at) (2:13) 
i 
ae Xn-distribution. Putting | Q6 Uh 
a= Bu aw = SeS- = EST] ti Yi : | , (2:14) 
FN Pr ac v yj 
(2:15) 


the y-a,XY X od 
acceptance conic is a Sa , 
s m is zero if the pair of 


Which :4 qp the type ter 
e that in w vertus A s 
Points he mean. The «pest? line is the regression line 


\ (2:16) 


as before. Notie 


may be interpreted 
gh t 


s (tay), (2j, yj) lie on a line throu, 
14 = 0, The limits for # are A 
limits a, = 4+ 


and f 
9r the intercept J at X = h, n\t (2:17) 
I-2aht fi» =t) ( «à ` 
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: s to & 
imi at corresponds 
se limits, an outcome tha 
i zs the range of both these 
Increasing S,, narrows za i 
5 f a physicist. u O—- 
E e cà in ie E direction is based on a limited number of degr € = : 
erro B aren 3 diese Rosita wf de 
bes ones with Snedecor's /-distribution replacing the y?-distribu ¢ 
same resu 
45 
40 
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10 
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Fig. 1. Example of acceptance hyperbola derived from a ¢-number. 


2-5. Further development pot 
$ an 

number having a simple probability distribution € tp? 

be found. This arises when estimates, each on m degrees of freedom 

error variances. Suppose the data to have been studentized, Then if 

from the ith observation on to the line (2-1) 


Consider now a case when a $- ot 
, are available m 
p; is the perpent 


o? 
D; = Óx,sinü— dy; cos 0, 


are the errors of the ith observ: 

measurements, p; is distributed N 

tributed x? (s? sin? 9 + 60? cos? ()). 
Now if àx;, dy, are the errors in E 


vat 
" . ing M ae 
where dx;, dy; ming there is no bias. gjë 


2:2) I? 
; and thence nọ in (2 2) 


ation. Hence assu 
(0, 02 sin? 0 + 07 cos? 0) 

"n 
= s = | were 85 9 


y from which the estimates s2 2 
' i rs d 
Ps = ôx; sin 0 — ôy; cos 0 
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is normally distr t t ^ 2 i t I K 
lly distribute! e 1s n1 I t 
rib 
uted, and herefore > Ps distrib 
ibuted as c. -squa ed. Bu 


a(S 2) = 2 Q2 
Provided ups m(o2 sin? 0 4- a3 cos? 0), 
22 
E(dx,dy,) = 0. (2:20) 
(2-21) 


Which is 7 " | 7 
8 usue se. S 12i 
sually th i i 
e case. Since the observ ations (x; Wy ) are student 
n ir Jil © udenti ed 
, 


a " m 
x ox? = d =m. 
a 2.22 
t m | 
v Oa dy) = | 
" Xn y, mh. 2 
S (dz, si a 
3 (02,5 — dy, P i 
zr. sin 0 — ôy, cos 0)? = m(1— 2hsin 0 cos 0) ( 
2-24) 


has 
asa y? (oi META 
š s 2 2 B 
r SIN“ O+ ey cos” 0) distribution. Thus 


Am 


na 
= 2h24 (2:25) 
14oj 


ha 
San F 
Thi P'(n, m) distribution 
8 result i i | 
Pte x is of little value since the qu 
; physical signifia: 
Paetus ioi = significance under usual conditions of experimentation 
plications arise v it i ; : 
S vhen, (a) it is not assumed that tl 
» ‘he measurement; 
s are free 


fro 
m big 
»e degrees of fr i i 
jus grees o freedom with wh 2 se "mi 
vir boi even hich sł, sj are determined are not equal 
sthods proposed here, roximate distributi " 
ribution theory 
y 


Ww : 
: ould appear to | coupled with appr 
ot imuspisdiod ead to approximate tests. Perhaps the lack of an exact statisti 

:d, since the organization of additional inform apa 


me. 

asur tion i 
enis A ation in the for: 

de nt requires pue D. oem rm of error 

ep $ quires a combination j x ilisti rs of 

P consideration.* a of scientific and probabilistic languages that merits 


antity A would not usually be known h has 
: no 


ILINEAR RELATION S 


ses of regularity 

ce of a ġ-number, leading to a con- 
tisfactory approach to structural 
st be capable of leading to 
s necessary even in the case of the linear 
ative hypothesis of a structural, but 
the ġ-number. A new theory 


3. CURVY 
3-1. The three hypothe 


e to the existen 


c. But any S% 
and engineers, MU 


The simpli 

id is. iia of the linear problem is du 

relations heory or to an acceptance coni 

curvilinear . they are understood by physicists 

"elation pe Systems, A more general treatment i 

non-linear. it wil have been noted that the altern 
ar, relation is not encompassed in the derivation from 


Will 1 
low be o; 
be given for the general relation 
"EL (3-1) 


Y = f(X. a) (J 
„derlying his $ 


elationships w 
are not uniquely determin 


cientific thought. In 


ed; their values are 


[n 
th pattern of 1 


physici 
physicist, there is à 
ters X; 


er 
word. 
" : 
, the disposable parame 
in their discussion 
seus of 


* 

the Kendal] (195 
951, 1952) and others have approached tentatively this question 

encountered in pract ice. See also §3-1. 


eren 
t types of variate that are 
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uncertain.* Moreover, the functional forms f(X,«;) can translate into each other, usually, 
however, within a limited class of functions such as wave or power laws. — 
The meaning of (3-1) can be clarified by enunciating three hypotheses iie Ea Wt 
(i) Regularity in the succession of observational data. This hypothesis is imp - we, 
fidence theory in so far as it is considered to yield results valid in the long run. Adoptms 
a much simplified argument, if X is a true value, x an observation, then 


€ = (X—a) di 


defines an error c. In confidence theory X is regarded as knowable absolutely and the e 
tion de = — dx puts all the error on to the observational data. Then the probability distri? 
tion p(¢) de is regarded as established by repeated observation. f p 
(i) Regularity in the totality of coexisting patterns of structural relations. Uf the observ pre n 
data are regarded as known and fixed, i.e. a subset of all possible data is agreed to be chose E 
then the relation de — dX yields what may be called a structural theory, the developmen 
of which has been the objective of this paper. As an example, consider the class of apr 4 
of structural relations that are linear. This class is abstracted from a wider class 0n p 
supposition that the disturbances of linearity are many and small. Such abstraction may 3 
considered valid if account be taken of the probability distribution p(«) de. found now fro! 
the central limit theorem and not by repeated observation. sean 
(iti) Regularity in the recurrent alternations between underlying patterns and observatio? 
data, Alternation of theory and experiment is a recognized element in scientific work. " 
reconciles the opposition of the first two regularities, taking into account that patterns 
their totality and the observational data in all their possible 


Ky un^ 
actualizations are both ! 
knowable, though for different reasons. Recur 


por 
rent alternation is the true source of the Re 
€ and its probability distribution, which strictly apply to neither of the first two regulari 


Ife is distributed N(x, c), then the relation de = dj. would appear to lead to fiducial theory’ 
the author at present considers that the fiducial theory of the location parameter /4 jo 
a structural theory in the sense defined here, whereas that of the scale parameter 9 ! 
be distinet theory, neither confidence nor Structural. 


It will be noticed that the first regularity cannot in fact yield results 
although the introduction of randomization 


7 
nay 


space-time, with a consed | i, 
oe o 
ion is needed to accommo” c. 


3 "ame. 
lucis cappa E 5 : "ue value from the location pare” ig 
The distinction is not trivial. See also the “Principle of Limited Vari ; jde” 

. " A , 1 " 'a pres 
address to the Aristotelian Society (1927-8), aniety” in C. D. Broad's | 


T For example, Kaluza (1921), Caldirola (1942 ,B 5 " thers “+g: 
explored the scope offered by a five-dimensional framework ei Thring (1949), end i en i 
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relativistic notion of repetition due to Bennett (1957), namely, that the moment of observa- 
tion, with its patterns of potential possibilities, is multi-valued in its actualizations. 

It follows from the second regularity that in a structural theory the relationship should 
be treated in toto and should embody the alternative relationships within the class 
abstracted from the totality of all possible patterns. 

The question then arises as to how an independent and relevant assessment of the prob- 
ability distribution can be made. This might be derived theoretically or experimentally. 
The former rests on the central limit theorem. In the latter case, consider the coal experi- 
ment that led to this investigation, where the errors were obtained from measurements of 
Sub-populations clearly defined by an attribute meaningful in respect to the measurements 
v, y and the structural relation under consideration. The attribute was determined by the 


third regularity. In the absence of such an attribute it is questionable whether an adequately 
teliable experiment can be done; the adaptation necessary in Wald’s (1940) treatment, in 
is not obvious and perhaps not 


Which the observations (x; y;) are divided into two sets, 
Appropriate, 
3-2. Structural theory 
Suppose that n measurements (t; y/;) and independent information relating to the dis- 
tributions pe) de, q(0)d0 of the errors (e, 0) associated with (v, y) respectively are given, 
hese being supposed to be free from bias. When X is given in (3-1), Y can be calculated. Let 
i(i = 1,... n) give the location of the true points corresponding to the observations (2, J;). 


hence t] 
he 2n errors ar 
n errors are (3:3) 


soa 0-1 
where f, is evaluated at X e 
nes 5 =n. Regarding the coefficients c; 
Ms 95, ..., n1), the true coefficients may be t 
Ven by variation of points in the coefficient space, y 
The joint Probability distribution of the 2n errors 6; 0; is 


„dep, dd, ..., dôa) = TI plei) 99.) 46148 


an n-dimensional coefficient space 


as lying in 
point. Then the errors 0 are 


hought of as à 
ie. by variation of a;. 


(3:4) 


p(de;, des, .. 
Transform ty anos entiables d The Jacobian is easily seen to be then x n determinant 
2n aria Aq eg 
af 
Box, ||’ 
anı i 
d the following equation 1 
| afi Il Tl pX; afi Ys) Xs do; 9) 


p(dX, ... dX p dag... d&n) = iei j=0 
" obtained, JL terpreted as a probability density in 
his equation, which is basic for what follows; 1$ ae ernie space a. The X, locate 

e , 2 -dimensiona m à 
2n-fold of the n true values X; and he having the Sanotional fom given. 


E te the curve 
in _ TUS points on the curve and the c; locate 


02; 


S (3-1) «le additive function of the disposable 
Suppose now that the structural relation is à simple a 
e H 
cients o. i e, m Lo (3-6) 


JE) = 24 
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where ¢,(X) are n independent functions of X only. Then 
n n—1 , Ju bed " E 3-7 
PTI dX; [I das) = (9X) II II WAX; — v) q(f; — y) d N dz. (3:7) 
1 0 i-1j-0 


It may be noted that the n functions ¢,(X) contain in themselves all the alternative hypo 
theses, as is necessary in a structural theory. is 
To proceed further from equation (3-5) or (3-7) it is necessary to introduce an gna 1 
for the true points X;. Strictly, this hypothesis should be related to the way in which A 
experiment is carried out. Thus a physicist would endeavour so to space his data along t e 
curve as to give it good definition. However, it does not appear to be convenient to introdue 
such information into a general treatment, especially be 
whether the physicist has achieved (or even, can achiey 
So long as there are n disposable parameters z 


observations x; themselves. But given an hypothesis that in the coefficient space of the a 
there is a region such that some of the os are zero, then the likelihood may be maximize 
with respect to the X, leaving a joint probability distribution for the remaining as. E 
A third method is to suppose that each true point ean be located anywhere on the we 
and to form the n-fold integral of equation (3:5) or (3-7) over the range of possible variatio 


. . qd ds 
of each of the X ;. This method is adopted here. It has the merit of stréssing the structu 
nature of the relation (3-1). 


Bee. Bi š P sertain 
aring in mind that it is uncerté 
e) his objective. the 
; the most probable positions of X; are 


3:3. Two observations and normally distributed errors 


S T i ss? 
When the errors are normally distributed and adjusting the scale of the observation? 
that both standard deviations are unity, 


r i 3-8) 
ple)de = — —e-M de nydy = Ja Ki " 
Jez) €, q(O)d Jen) mde. 
Here bias has been removed by taking zero means. Consider the linear relation 
(39) 
VY=ayt+a,X. 
With two observations, equation (3-7) becomes 
P(dayda,) = a j E 4 
od) Eea x s m| 0 - X3) | 
xP HX =m) A(X,- aye 4 Sih Ta 4o X4) 0) 
Edi . i 31 
whence 2 Bat tat yi?) TX yd X dada, i 
p(da da) = | [ets E Yo) + (n =) 
|l 27(1 tai | 
I f Yi ty ig i 
xexp|- 2) (s ET Me ei E bois - 3) 
(ale potet Htm gy |daodar 


| o 
As might be expected, the probability distribution of x 1 
1 


S 
$ r z is i k ere? 
distribution of % contains a. 1s independent of oy, W h 
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Transform to independent variates u, v by writing tan 0 for a, and 


[2u = (x, —2,) sin 0 — (Y2— y1) cos 0, 
H €— 3.19 
© = (ay 9i Fy) 0080 + (s +a) sin. ind 
J3 
I>: 4 r. . " es 
2/2 is the length of the perpendicular from the midpoint of the join of the two observations 
on to the line (3:9) and u 2 is the length of the projection, perpendicular to the line (3-9), 


of the join of the two observations. Then u, v are each normally distributed with zero mean 


and unit variance. The modal values of u, v correspond to the line passing through the 


Observations; for this line 


& E (ens, tnp = (3:13) 
Vy, 
3-4. The Q-number for the linear relation 
Given 2n observations and normally distributed errors, write 
vu, = (torm 33, .,) Sin O, — (Yor — War 1) COS Op» | (3:14) 
v] /3 = (a -Iar H1) 2080, + Har + ora) SiN 0,.| 
80 that the observations (( AE Yor—1) (Tor: Yop) are associated with the line 
(3:15) 


Y 2a, +X tan ð, 


Then each of u,, v, is distributed N(0, 1) and they are evidently independent. Thus 


2i RS 
2nd = X (u? +o?) 
) cos 0p + torı Sin 03%] (3:16) 


n . x 
= Y. [((— yo, + or) 0080, s, Sin OJ? CC Yara + 2, 
r=1 
S distributed as y2,. Now for the linear relation 
X2n " e 
old 
dor = o 0, — 0, 


and equation (3:15) becomes exact Also the 2, distribution yields a likelihood A in , or 
he rd i “sin $2. It is interestin 

p merical space: hence a structural theory for the use of ¢-numbers in 3 2. It is interesung 

i : . ved expresses the alternative 


© not ; š is result is deri 
e that > parti r way in which this resu : i à 
the particular way vairs of observations do not lie on a straight line 
a 


ay Pothesis i " 

sis in the form that one or more I Á uw à 
"Tough the mean. Tt follows that when in $2 there is no real locus of pei eng sree 
“ternative hypotheses of the form indicated in this paragraph must be e i 
he aea ttern of structural relations, e.g. that there 


unda; : rivi ^ > pa 
s mentally, alternatives deriving from en : oe gared. For the vates gern 


Sq : 
not Parabolic or higher degree relation, maj 
yet bee Ne hi 
h pcr ed bv examining whether 
he validity of the linear hypothesis (3:17) may also s di bcn ae an ifo 
lere p à „mal deviates. +0 , Stall 
are son «which u, v, are norma : s 
Mean sum > pa sm ^ I pz ere the /'s are cumulants, might be nir pem ar : 
: of squares and Para: B i - approximate in virtue o 
compared with Fw» known distributions. Such tests are only apl 


e $ 
Mation (3.15), 


wh 


94. Bivariate structural relation 


3:5. The general case 


For normally distributed errors, the general case remains unsoived, For — E ote, 
relation as a limiting case of a parabola cannot yet be obtained. The ae en "s 
siderable. Those to be expected in any problem of curve fitting become ev ic wea bm 
further complication that the slope of the curve in the neighbourhood of each i ES ip 
enters into the compounded error variance, so that the equations are non-linear d i d 
tion of an approximate method—specifically applied to a parabolic relation—is giy n "m 
appendix. It is hoped to discuss elsewhere further developments from pow 
having in mind that the Jacobian contains in itself the whole scheme of alterne 


: ;  hivaptafe chmicturál 
hypotheses that can be introduced into a discussion of any class of bivariate struc 
relationship. 


4. CONCLUSION 

Whether or not the physicist or engineer needs to use ate 
perhaps beside the point. His procedure consists of a series of steps. First, an endeavo E 
made to isolate certain phenomena, Secondly, an attempt is made to reduce nn 
measurements by conventional rules, these being generally accepted by scientists. Thit “ue 
the results of reliable experiments are compared with (a) general background knowle 2i 
and (b) present theoretical (structural) conceptions. Then a fresh experiment is d j 
The interesting question is whether an investigation of the type undertaken here can 8 by 
the strength and weakness of the severa] steps, adopted intuitively or unconsciously, 


ar 
cid mem à k Ren serfs 'esent lines appe 
the practising scientist. For this reason, further Investigations on the present lines apP 

to be necessary. 


ET ^ inference i$ 
statistical methods of inference 


The author is indebte 
several useful discussion 
for guidance in formula: 


d to Mr F. Fereday, 
8 during the course o 
ting his ideas in the 


for 
Mr 8. R. Broadbent and Mr W. D. Rey al 
f the investigations and to Mr J. G. m " 
form of the three regularities given in $“ 
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APPENDIX 
Approximate treatment of the curvilinear relation by relaxation of constraints 


rn solution given for two observations and normally distributed errors can be used to treat 
homer the general case. First we relax the constraints on the observations imposed by the 
endi ral relation. We have to assume that the observations are fairly evenly spaced along the curve 

d that there are no "repeat? observations. Repeat observations are equivalent to a reduction in the 
y, special treatment. We use the approximation (3:15) 
d of ordinates with abscissae (sh +1). The grid 
there being 2n lines on the grid. Then 


a Es t he curve and would require, in any theor i 
is Glos t s intercepts F, of the curve on an equidistant gri 
k renis en so that the abscissa (sh +/) is close to the observed value t, 

osing the origin so that l is zero: 


Eor = Loe + (27h) tan 0,, | - 
Toop = Qor + (2r— 1h) tan 0,, 
P" "Mena Qor = (rkr 2r— Kor), 


Tn this case 


{Yar + kort pues: , 
T 1+ (kor — ks, I2 
he Constraints on k, arising from a structural relation are now easily introduced. For 


UP (utog) = 


example, a poly- 


Nomj 
mial of degree m gives n 
i= > a,(sh)', ios 
i-o 
n j i ini * 
2n, or Lees has a j?-distribution, minimum j* 
H 


then be found for the a,. The minimum value of 


88timato, R 
rs (also maximum-likelihood estimators) may z es 
ae has then an approximate A350 distribution. The degree of the polynomial pane for oil 
ntation of the data can thus be found. Owing to the form of equation (A 3), it is necessary 1n practice 


o * i 
adopt an iterative method for finding the parameters [22 


the 
re bei A 
being (m+1) unknown parameters a; Since 


Pay, 
abolic relati 
ation 
sub-section yields a statistical test for establishing the 
i tural relation between two variates, 


s a struc 
1 of degree (m 4- 1) or 


TI : f 
e approximate treatment of the preceding 
being that a polynomia 


e 
Bree M of a polynomial (A 4) that adequately represent 
T eu eM to normal errors, the alternative hypothesis 
er is not, necessary. Consider the parabolic relation - 
y y soa FX 4X 
i n (A3 
"en the points on the grid (sh, k,) lie on this parabola, we have from (A 3) 
ay = asc agit ner} (A6) 
tan 0, = 0 +agh(4r— 1). 
mum 2nó are 


Th 
en a 
the equations determining a, 2, % for min! 


n 
3. a, cos, = 0 


r=1 
À , (A7) 
Y e, cos? 0, = 0, 
r=1 
= 

X rc, cos? 0, —hr 2r— la, cos 0, = 0, 

=1 
T: " 


sÓ,-- (xor t yi) sin 


Where 
(24r — Yar — Vai 


ea e 
a, = (209, — Yor + Yor-1) CO (A8) 
= (Yor — Vari) — 


agri) — (Par Yer n 


c, = 1sin 20,[2(2» xb 
+2cos 20, [or tor 
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" seta 
i i T l, 0}, a}, c} the corres ing quantitie 

Let 21, a1, a1 be an approximate solution of equation (A 7) and @,, 07, ap, c the corresponding q 

gUq3* "Ua 


from equations (A 6, 8). Then expanding (A7) by Taylor's Kravat; we find the approximate lineat 
simultaneous equations for Aa, A&,, Az», where Az, = (x, — a), ete., to be 
Z2cos? 0 Aa, + Eb! cos? 0! Na, + E(bL4r — 1 —h 4r2r — 1} x cos? 03h Aa, + Xa! cos Ot = 0, 
E2b} cos? 01 Aa, + Eet cos? 01 Nai, + X(ehdr-1 —hbi4r2r—] Vx cos? 03h Aa, + Nel eost 0} = 0, 
X2r cos? (2b; — h 2r — 1) Agg + Xr cos? 0201 — (2r — 1) bt} Aa, 
+ Er cos? 024r — 1 cl — hb 2r — 14125 — 1 4 24r 2r — 12) Mt 
4- X(2clcos 01 — h 2r — lal) recos! = 0, (A9) 
where by = (tart Var-1) COS 20, — (225, — Yor — Vor-1) Sin 20,, 
e, = (2(v$, Heira) — (yos — Vari)? — (22, — Yor — Wo a 


sin? 20, 


x {cos 20, cos? 0, — ) — {2sin 20,(2 cos? 0, + cos 0,)! 


7 A 10) 
X [tr rar à) — (ap Yor + ori Yar—1)}- ( 


nize 
Using equations (A 9) an iterative method may be employed to find the 
(2n@). The first approximation may be found easily 


likely points on the curve. 


5 inin 
values of zt, 2, o, that Hm d 
from the equation of the parabola through 


Note added by Author in proof 


Professor M. G. Kendall has kindly drawn my attention to » 
and Hotelling (1929)* calculate the variances of the s 
sion and show that these define an hyperbolic reg a8 
an assigned probability. Their hyperbola is similar to that given in equation (2:15) abo". l 
the denominator of the first term (Y —a,X)* being (x2) instead of (0, —w). Working x n 
Hotelling ascribe two degrees of freedom to X3, due toa (presumed) independent variat! i 
of the slope and intercept of the line: I have been unable to follow their reasoning ^ 
this point. In my case, n0, has a y%-distribution, but I have called for indepe"' j 
knowledge of the variance in the error of the y-measurement. 
The two papers deal with different questions: 


i ; torki 
a paper in which Wo! d 
i ; . reg 
lope and intercept of a linear reg 
i ithi i x „na wil! 
ion within which the ‘true’ line lies V 


* : ne 

: in particular, I was mainly conce" gt 
with the case when both variates are in error. But the duality of the confidence elip" 
and the acceptanee hyperbola noted in my section 2 i 


à ‘ng 9 
s 72 is anticipated in the Working 
Hotelling paper. 


f 
. 0 
* H. Working & H. Hotelling (1929). Applications of the the 

trends. J. Amer. 


‘on 
June or We interpretat! 
Statist. Ass. Suppl. 24, 73. Y of error to the inter} 
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AN ANALYSIS OF PAIRED COMPARISON DESIGNS 
WITH INCOMPLETE REPETITIONS} 


By JOHN W. WILKINSON 


University of North Carolina 


1. INTRODUCTION 


Experi 
xperime . di à k i å š 
ents involving paired comparisons have mainly concerned the situation where 


on xd ae all possible pairs of treatments or objects. In certain types of experi- 
To edt: require an excessive numse of comparisons to be made by any observer. 
Tan É is handicap, Bose (1956) and Kendall (1955) constructed certain designs, 
all iy 7 SN respect to objects and judges, which do not require each judge to compare 
of MEM I objects. However, neither Bose nor Kendall proposed any procedure 
The ‘pur : i aeir respective papers. . 

Paired e. nse of this paper is to consider the problem sis 

iparison designs. This analysis is carried out in the tradition of the fundamental 


Bradley—Terry (1952) paper concerning the situation where all possible pairs are compared 
Aen judge. Using the Bradley-Terry mathematical model, likelihood ratio tests are 
Mur in detail for certain classes of hypotheses, and are stated for some additional 

ions of interest. To exemplify the test procedures, the proposed analysis is applied 


9 an ex i d : 
experiment involving pairwise comparison of handwriting specimens. 


of analysis of the Bose-Kendall 


2. DEFINITION OF PAIRED COMPARISON DESIGNS 


Bose, is as follows: 
ding to a certain characteristic, 


The procedure of comparison 


A pai 
ar : : : 
ed comparison design, as defined by 
desire to compare accor 


Su 

p ; . 

Ppose we have t objects which we 
parisons. 


and 
Will e have v judges available to perform the com 
© as follows: 


i 1 . 
ms Each judge compares r pairs of objects (1 <” < y(t- 1)), 
the ; Sses his preference for one or the other object of the pair. 
Judge to express no preference with respect to either of the objects formin; 


Ow, ; : 
Over, this possibility will not be entertained in this paper.) 


(ii) Th 1 
i REN arı any judge are all different. 
oc ema each judge, each object appears equally often, 


Among the pairs compared by 
, € times, 
( Each pair is compared by + judges (1 < k< v). aiui mpared by both judges 
osa en any two judges there are exactly A pairs which are comp rump EET 
Spo Se constructed thr Jes of paired comparison designs by aang oe Pan 

ed three seri I e of these series produced 


the Gene with known balanced incomplete block designs- On 
$ designs as those constructed by Kendall. 


and, for each pair compared, 
(It may be desirable to allow 
g the pair. 


tt ; 
of g This ‘ead States Air Force through the Air Force Office 
Codin i was jointly supported by the Lus er ad Command, and the National Research 

hei] of © Research of the Air Research and Develop 
Biom. 44 


anada. 
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From the relationships among the parameters of balanced incomplete block designs, and 
from the above definition of paired comparison designs, there exist the following relation- 
ships among the parameters of a paired comparison design: 


r= Ha, Av—l)-r(E-1, vx—k(—1), txz2k, A<ła(a+1)<r. en 


3. EXPERIMENTAL PROCEDURE AND NOTATION 


Suppose we have available v j udges, and we require each of these judges to compare 7 pairs 
of the 3t(L— 1) possible pairs of a set of t objects (ay, dy, es), (1 <r < hi(t—1)). Ther pale 
to be compared by each judge will be specified by the field plan of the appropriate paire 
comparison design employed. Presently available designs are given by Bose (1956) an 
by Wilkinson (1956). We shall refer to one set of r p 
incomplete repetition or a complete re 
r = M(L-1). When r = 4(1— 1), the si 
by Bradley & Terry (1952), Bradley ( 
For each pair of objects compared 
ference for one object or the other, a 
and the rank two to the object judge 


à x he 
Let r;;, denote the rank assigned by judge u to the ith object when it is compared with 1 


5 > i e 
jth object. Let (a;—>a;| u) denote the preference indicated by judge u for object &i m 
object a; (ij, DW MN l, ..., v). Then 


airs compared by a specific judge as E 
petition, depending on whether or not r < 4((t— 1) G 
tuation becomes that which is considered in dete 
1954, b, 1955) and Terry, Bradley & Davis (1952). 
by judge u (u = 1, - v), the judge expresses & a 


d inferior. 


l, if (a;—a;|u), (p 
2, if (a;<a,|u), 
if objects a; and 4; are compared by judge u. F 


Tij, = 


rom (3:1) it is readily observed that 9) 
Tiju + Tiiu = 3, e 


: th? 
when objects d; and a; are compared by judge u. To handle the complication created bY 
fact that each judge does not compare all possible pairs of objects, we shall define 


l, if a,and 4; are compared by judge u, e? 
0, if a;and 4; are not compared by judge u, 
(+j, 0, $ = 1, ..., tu=1 


Dig, = 


» +++) V), and we shall conventionally take 

My=0 (i=1, vot u=1, cast). 
Hence, for each judge u we can define an incidence matri 
has 1’s or 0's for elements, and is specified by the field pla 
used. These matrices N, satisfy the conditions 


(la) ME=ak (u= - 


ot” 
ad N, B (niu) which is sym? oil 
n of the paired comparison 


t v), 
v g” 
6) X N,-XkE, 
u-0 
(c) trN,N, = 22 (uw = — gÉ 
where Z is a matrix of order t eac 


T 4 p " riot 
ssigning the rank one to the object judged super?" - 


| 
| 
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Conditi j 

(i) is Meal (a). (5). (c) are directly necessary for the conditions (iii), (iv), (v) of $2. Condition 
‘isfied since N, has only 0 and 1 as elements. Since 2r is the number of unities in N, 
ue 


the co d 
nstaney of r follows from (a), which implies the number of unities in each row of N, 
u 


tob 
ea. Hence 2r = ta. 
Y, X and 


To si i 
mplify notation, we will i i i 
À n, we will introduce the following conventions: 3, Y, Y. 
i j u m 


II. 4ll indi i 
J I, II, TI will indicate, respectively, single sums and products with respect to the 


u 
depi m 

cte se : . A 
d quantity over its range, where  — 1,...,4; j 2 l, out; U = 1,0; M = 1, 0039. 


X’ will iz 
F sanois E : for some i, where this i will be clear from the context. ¥ and [T will 
indica " as Aske i<j i<j 
í te, respectively, double sums and products, i = 1,...,7—1; j = 2, ot. X and [T will 
Indicate, re. ; AE be te ons 
; respectively, X X and [| II. Any departures from these conventions will be 
i=1 j=1 i=1 j=1 

Specifi , i+j i+j - 

ed when the departure is incurred. In addition, log and In will be used to denote 


Common ; 
9n and natural logarithms, respectively. 


S 4. MATHEMATICAL MODEL (BRADLEY-TERRY MODEL) 
u 
o ; P z 
Ppose there exist numbers Trur «255 Me corresponding to judge u (u = 1,.-., v), where 


m.20  (-lou- pm (4) 


Tiu 


and su 
ch that P(a;a;|w) = et 
iu ju (4:2) 


T ju 


" ,;u= 
Patay | i ) Tint Tu’ 


are imposed partially for con- 
f equations which will be 


Ge e 
Jj 5jel.ghau- 1,...,v). The conditions in (41) 
as true ratings (or preferences) 


inacy in certain sys 


Neo 
Unter, ; 1 
ntered. These numbers Tynes Thin will be considered 


le t ob; : 
t objects a,, ...,a, corresponding to the judge v. 
CTION 


5. THE LIKELIHOOD FUN 
to object a; and rank 7j, = 3— Tiju 


g rank fiju 


Fro 
t (42), the probability of judge u assign 
ects is 


o ob; 
e à : 
Ject a; upon comparison of the pair of obj 
mg Tie EP : (51) 


zi 2-rjiu 
Tu) (e) = Niu TL 

Pro ` m Tiu + T'ju p ie P 

the, ed that the pair of objects is compared by judge u, we observe that if (a; >a; |u), 
tju = 1, fiiu = 2 and (5:1) becomes 
Py Tin — P(a; |u) 
ue Tii Tj 

= 1 and (5:1) becomes 


anq R 
if (Cares | 
j | u), then Tij = 2, Tjiu 


7ju = P(t; 4t | u). 
Miu + T ju 


Niju 
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If we assume probability independence between pairs of objects when compared by 
judge u, we obtain the likelihood function for the set of comparisons made by judge u, 


TI yie Y MijuTiju 


ee ae 
LER Turo T jiu i 


Wii Tga I n uris, + Tj) 
d 


m2 
L, = [I niju (w= 1, ..., v). (5:2) 
R 


OON NR i 4 r E " zlk 
where R denotes the set of numbers R = {i,j: i<j.i = 1, eg LE 9 2 2, sd Raju j 3 
If we assume probability independence between experiments performed by d 
judges, then from (5-2) we obtain the likelihood function for the complete set of compariso! 
made by the v judges, Exe ITE. (5-3) 
u 


and hence, if we set aj = 22v — Y X hijut ijw 
ju 


4 
In Za = VF nay — 2 È Miju lN (Tey + T ju). G ; 


i<j u 


we have 


6. LIKELIHOOD RATIO TESTS 
Test I. Consider the situation where each of the v judges comp. 


the set being specified by the paired comparison design used. Wi 
the differences, if any, amon: 


make the assumption that t 
the null hypothesis, 


sg) 
ares only one set of 7 wt 
e desire information — 
g the true ratings of the ¢ objects. I nitially, we are — 
he v judges are consistent as a group; that is, we desire to "^^ 
Hy: Tiu = YU 
against the alternative hypothesis, 
. 51) 

Hy: ty = 7; @=1,...,tu= 1255 3); e 

where the zs are not all equal. 


When the alternative hypothesis H, is true, (5-4) becomes 
6? 
(nZw| H) = ZafInz;- EY In (m+n), ( 
i i<j 


3 Š " 4008 
where the simplification of the latter term is aided by the properties of the incidence matr! 
listed in $3. 


We note that a, defined in (5-4), can be rewritten, using (2-1), as 


až = 2k(t = 1)- 25 = Nijulijur 
S 4 
When r = M(t— 1), then ny, = 1 for alli, j, u (i4), b = v, and 


a = Wt-1)-—Y yy, 
jou 


tju’ 
and corresponds to a;, defined by Bradle 


y& Terry (1952 
Now maximizing ( 


: ), with n replaced by v. 
6-2), subject to the constr. 


. i ns 

aint 3771, = 1, yields the set of equatio! 

až , i 

z ed (7; +7) +n — 9 (@=1,..., 
t 


: t), (o? 


Xm, - 1. 
1 
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aming (6-3) with respect to i yields j 
(2at—4r)v+ jt = 0, 


which in ie: = š 
apli S E i V 4i 
H 0. since from (2-1), r= ia. Hence the maximum-likelihood estimates 


d ess ; 
Dy of 75, ...,7 will be obtained from the system of equations 
a ) 
p, EE aip =0 (i51,..t) 
Ep =1. 2 
Wh à 
en the null hypothesis Hy is true, (5-4) becomes 
(In Lay | Ho) = (22 — 3)In;— i Lo 
2 t 
kt(t— 1 
--— HE Nin 2 
(65) 


=-orln?2. 
against the alternative hypothesis 


Thus ali 
the likelihood-ratio test of the null hypothesis Ho: 
where, if we define 


bass s 5 
pecified in (6-1), will be in terms of the likelihood ratio Ay, 
BO = k X log (pip) - Eat le po (6-6) 
1 


i<j 


Ww 
here 'p,, s, rare soluti 
, are solutions of (6-4), then 
Ind, = —{vrln 2— Bn 10}. (6-7) 


Whe 
ne 
tidig 4u(t—1), then k — v, and B® corresponds with the statistic B, discussed by 
po, pe Terry (1952). 
Would ag likelihood ratio in (6:7) t 
hull h e desirable to have some knowledge © 
I; NUS Hy is true. 
dull indie to generate all combin 
Com, oe of equality of true obje 
s iii. of rank sums is obtainable. 
s 10n of BO under Hy 
ever. a di ; 
Sums ‘eur .a direct computation of the pro 
t be extremely tedious, even for sm 


ever 
Y permutation of the rank sums Corr 
ssible 4t(t— 1) P 


» We can utilize the symmetry of the designs to circ 
(— 1) eleme 


et ‘ 
NN — 1, ...,1) denote the set of va = k( ; . 
hat Y all the v judges to the object d; when compared with the other (t— 1) objects. 
S Par re are va = k(t— 1) such ranks follows from the fact that each of the v judges 
depicted i! with exactly æ of the remaining (t— 1) objects. Also; from the design properties 
mith ea, in §§ 2 and 3, it follows that these k(t— 1) ranks are those for a; after comparison 
k qis; Ch of the remaining (t— 1) objects exactly k times. Hence, we can subdivide A; into 
Pd) elements which are the ranks for 


Isjoj 
nt s 
subsets 4... with each Aiw containing ( 


iu 


hesis stated in (6-1), it 


o be useful to test the hypotl 
)—at least when the 


f the distribution of BO 


ject sums of ranks. Then under the 
h judge, the probability of each 
would be sufficient to obtain the 


ations of the ob 
et ratings for eac 
This information 


se various combinations of rank 


pabilities of the 
all values of the design parameters. The fact 


esponding to each judge is not possible, since 
airs, is the cause of the complication. 
umvent this particular difficulty. 


ats which are the k(t— 1) ranks 


Sach ; 
Jud 
However e does not compare all po 
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a, after comparison with each of the remaining (t— 1 ) objects exactly once. Thus if we denote 


the (t— 1) elements of Aj, by rij, (Jj; j = 1, ---,#), then the sum of elements of Aj, is 
Siw = x’ Tiju» (6:8) 
J 
and the sum of the elements of A; is 
k 
, 9 
21 b» Puy = DD hijut ijw e ) 
u'-1 j uj 


Hence the set of rank sums 


k k 
iD DM jul yw — Xung) = ( È D Pijus savy SY DY! rye) 
j u '=1 j 


u'-1 j w=1 j 
k 
10) 
=> (Sys, +s Stu) (e 
w=1 


could be considered as being the set of rank sums obtained from k complete repetitions of 
all 4t(£— 1) pairs. It is these rank sums which determine B®, and hence its distribution ge 
be determined from the distribution of the different combinations of these rank sums. t 
possibility of putting the rank sums in the form of (6-10) enables the distribution of BY 
under the null hypothesis H, of equality of true object ratings, to be obtained in a manne’ 
described by Bradley & Terry (1952). Hence, tables presented by Bradley & Terry (1952) 
and by Bradley (1954D) may be used to provide the distribution of B® under the PY” 
hypothesis H}. The tables are available for design parameters t and k in the following rang? 
t=3,k=1,...,10;t=4,k= 1, ...,8;t=5,k = 1, ..., 5. These tables also list the estimate? 
Dy; +++) Dy Corresponding to the rank sum combination (6-10). ;ne 
Thus a test procedure for (6-1), for t and k in the indicated range, is as follows: Determi” 
the rank sums (6-10). Then from the previously indicated tables, obtain the correspond!" 


values for p,,..., p, B®, and the probability P that B® will not be exceeded if the 7" 
hypothesis H, is true. 


ed: 


If either t or ki i j 
T t or kis outside the range for the tables, the above test procedure cannot be ies Yes 


However, i i i ndi A x à a 
er, if only k is outside the indicated range, it is possible to use the available t^^ ; 


to obtain the estimates Px, +++) Dy or at least a good first approximation to them, depe” 4 d 
on whether or not there exists an integer c which divides the rank sums and k eve™Y nd 
which is such that k/c is within the above indicated range for k. This technique is discus " 
in detail by Terry et al. (1952) and by Bradley (19545). l : 
If £7 5, the available tables will not be of assistance in obt 
Py «++» except in special cases. Hence, to obtain e 
be necessary to solve equations (6-4). Some meth 
this form are suggested by Dykstra (1956), and Bradley & Terry (1952) 


Once p, ..., p, have been obtained, B® can be ev: 
zeo " aluated fi 
level of B® can only be approximatel. ss 
in terms of the statistic 


£0 
" ions 
aining approximation” will 
stimates, p,, +++) Pp in this case; } ns of 
ods aiding in the solution of equati?' 
e 
gat mu 
(6-6), but the significhi pe 
y determined. When this is the case, the test 
3!) 
TO = —21nA, = 2vrln2—2B0In 10 m 
the discussion of which is reserved for § 7, w? 
Test I 1 es that in (6-1) the 7s of the alternative hypothesis are divided int? wil 
groups, the elements within each i : 
hypothesis group being equal. That is, we desire to test * 
Ay: Ti = Vt (i= 1,0: 
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against the alternative hypothesis 


mo(6—1,58) 
Hy: Ty =) 1-87 (6-12) 


= (i =s+1, ont), (& = Ls) 


For this situation, the equations (6-4) can be solved explicitly for p, the maximum- 


likeli > 
ikelihood estimate of 7, which, using (2-1) is 


jatae—e—8)=2'S 


DDT Mijuiju 
i=l j u 


(6-13) 


p- 


" : s 
ks(Bst — 2/2 — 68 + 3t)— 228-0) Xi E E ijuiju 
24s 


- 


portance if we proceed in a manner analogous to 
te the number of times an object of the first group 
nd group of (t— s) objects. Then 


v ie this expression diminishes in im 

of s dh 7 by Bradley (1955) and let X deno 

jects ranks above an object of the secor 

$ —n X 6-14 

P = kst-spaQs-0X (6:14) 

oa the test procedure for (6:12) reduces to that of the binomial or sign test based on the 
5(!— s) comparisons of the objects of the first group with objects of the second group. - 

ES es III. Suppose we have g groups with v judges 1n each group. We are interested in 

estigating the equality of the true treatment ratings of thet objects under the assumption 

of within-group judge consistency, but not necessarily assuming between-group judge 

consistency, It should be oheorwed that these g groups could contain the same » judges, 

ee due to some additional feature, such as significant time lag between repetitions, or 

“ining gained from continued experimentation, we are unwilling to assume between-group 


Judge consistency. AE 
. Met mm, ..., 7 represent the true ratings of the € objects corresponding to the w h ju ge 
™ the mth geom (u — 1 amel g). Then, we wish to test the null hypothesis, 


Hy: 7h = Ub 


Bains the alternative hypothesis, 
Hy: mh = T a1, ah 97 S Lg) (6:18) 

the M = 1 or 0 depending on whether OF = P i up imc E 
Yanina face eae B ea git apt is 1,2 0)- Tt is clear that the corre- 
"Pending accio irt id EE 

mash) ehe leot 
T S Pais propečties to those gion redi with the jth object by the uth 
oe na ee peng m 

mor = 3B (m= 1,255 4 7 1, es?) 

ud he uth judge of the mth group. 


Wh, 
?^ the ith and jth objects are compared by f 
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Now, under the assumption of probability independence between pairs of objects, judges 
and groups of judges, and by defining 


: »16 
at, = 2av— Y, Y nura. (6:16) 
j u 
m ad 
we obtain (In Ly | Hj) = E {Z afa nae —k Em (Gr? mp (6:17) 
m i i< 
(In Lo | Hy) = —vrg In 2, (6-18) 


where Li is the likelihood function, and (In Liy | H,) denotes the natural logarithm of the 


likelihood function when the hypothesis H; is true (i = 0,3). Then, if we denote the likeli- 
hood ratio by A, and define 


Y. 9 
BY = k X log pf pr) - Sak, log pt iin 
ws t 


where př, .... p? the maximum -likelihood estimates of na 


. T system 
€ m”, are solutions of the syste 
of equations 


SE-EE (p+ pp) = 0 (i 1.4) (6-20 
xp =1 (m=1, ssl). 
and define B® = y Bw, (6-21) 
m 
and TY = 2urIn2~ 2 B In 10, (6-22) 
we obtain TO = -21nà, = y TO. (0-23) 


m 
The probabilit; 


y of a specified value B of B® can be obtained by determining the pro 
babilities of the i 


ioi ae > BW = P. 
Joint occurrences of all combinations of B®, ..., B® such that X Dj 


H m T 
pt then summing these probabilities. Clearly, the distribution of BO (m. 1,2) und? 


b is the same as the distribution of B® under IH. Hence, the distribution of B® under Hh 

can be obtained from the distribution of B® under Hy. Tables of this distribution for! = 3,9 

k = 2,..., 5 are given by Bradley & Terry (1952), and for t = 4, g, k = 2 by Wilkinson (1 56) 
The test procedure for (6:15) is then the following: Compute the rank sums 


Z Xn. (i = 1, wines E); 
ju 


corresponding to the mth group of judges (m — 1, 59). Then 
are within the range t= 3, k = 1, -510;t=4,k=1,. 


B® from the previously indicated t. 


(6:15) will not be available. Whe 
T® of (6-23), the approximate s 
in $7. 

For large t, g, k 


n this is the case, the t. 


ignificance level of which can be determined as indic? 


m 
pi 
olve the equations in (6-20) for pr wok 
(6:19) and (6-21). 


; It will be necessary to si 
m = 1,...,g), and then to evaluate B® usin 
g 
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Test IV. It is i 
. It is conceivable that the g groups of v judges considered in Test III could have 


had t} 
1e same j ine: is si ion, i 
e judges in each group. For this situation, if we are willing to assume consistency 
repetition to the next, we may wish to test the null hypothesis 
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of the judges from one 
Hrg = it, 


agains 
Sunst the alternative hypothesis, 


" Hy msn, ((= le bua lj vi m= l, 9) (6-24) 
Ne likeli " m s e 
ikelihood ratio test for (6-24) is provided by computing the rank sums 
YXXEnlCU G= lnt) 
and m j u 
BO = gk X log (pi p) - E et log Pe (6-25) 
i<j i 
Where ql ' 
ally à "e " : 
i= Daj, and py. ..., pp the maximum-likelihood estimates of 7, ..-.7 are solutions 


of t m 
he equations 


al 
gk E (pet Pi) 
Pi j 
Sp; =1- 
t 
the range of available 


? corres 
"responds to BY with k replaced by gk. Thus. for tand gk within 
m these tables. When 


Be 
d directly fro 


tab] 
es, Ba), ; i 
either $ and its exact significance level can be obtaine| 
age, t} or gk is outside the range of available tables, an exact test is not available. In this 
» the test wi ; ; 
€ test will be in terms of the statistic 
(0-26) 


q = 3wgln2— 2B In 10, 


the 
ur mà significance level of which can be determined 
Ach of the be noted that there aro many ways in which g sets o 
‘samedi 3 mala, No matter how the sets of pairs are assigne 
3 Assign istribution underthenull hypothesis Hy 
a tha RUM of the g sets to the v judges would be the popu 
às ea would be better to have each judge compare as man 
: einai as this may provide greater connectivity. The investigation o 

le, but will not be considered at this time. 

for judge consistency between 


est y : 
- For the situation of Test ILI we now wish to test 
thesis. 


as indicated in $7. 

f r pairs could be assigned to 
d, the statistic B® will have 
of (6:24). This would suggest thatsimplicity 
lar criterion. However, it is 
y different sets of r pairs 
f'an exact criterion 


Erou 
Ds; t g 
hat is, we wish to test the null hypo 


Hy: Whe = Ti 


agai 
nst t 

he alternative hypothesis, 
Levi m= ag) 


he likeli Hy: 7, =n (t= entit” 
ihood-ratio test is in terms of the statistic 

qu) = 2( B® — B9)In 10, (6-27) 
s H, wi 
ut will supply an ap 


th 
e di 
I dn i spend on Tis -+ Te Hence, 
TS i ibution of which under the null hypothest ]l depen pest 
ats 25 i b proximate test as 
?u provide an exact parameter-free test. 


dinya. 
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It is natural to desire a test concerning the true ratings of the t objects without yo, 
consistency of the judges as a group; that is, we would like to have a test of the nu 


hypothesis, Hy: Ty = Mt, 


against the alternative hypothesis, 
H: the 77,,’s are not all equal. 


The likelihood ratio could be determined but its distribution would depend on the paired 
comparison design being used, the number g of repetitions made by each judge, and the 
criterion for assigning the g sets of r pairs to each judge. The assignment of these sets 18, A 
itself, a design problem. However, it is conceivable that with the assignment of these gets 
to satisfy certain additional symmetry conditions, tabulation of the distribution for smal 
values of ¢ would be practicable. t 
Of course, if g = cv, c > 1, such that each judge compares each of all the possible v differen 
sets of r pairs c times, the situation reduces to the equivalent of each judge making the 


"arrV 
comparison of ck sets of all possible 4¢(t— 1) pairs. A test for this is given by Bradley & (Terry 
(1952). 


7. LARGE-SAMPLE DISTRIBUTIONS 
1 z d 
Let y ((n-1) È= Lt) u 
h z x r s . e Now 
where p,,...,p, are the maximum-likelihood estimates of 7,,...,7, given in (6:1). 


observe that 
D E D MjuTijn = $ee(t—1); 
$72 $% 


hence, Daz —2ovwt—$kt(t—1) 
| = ġkt(t— 1), 
where a£ is defined in (5-4). Now, upon substitution in (6-6) and (6-11), we obtain 
B9 = hkt(t—1)log2+k 2 log {1+ 4(y;+y,)}— X at log (1 +y;), d 
i<j i 
and T = 2k Xll ti ty) - 2 X afln (14, e 
i<j i 
respectively. Substitution of (7-1) in (6-4) yields 
ay = My) V+ ety), 
and hence, upon substitution for až in (7-3), we obtain 
TY =k E (1+y,;)ln(1+y,) Y A++ vj) 2k Vin{l+ dy, +y;)} " 
T®, as expressed in (7:4), can be put in the form i 
TO = Tht Y yt - R(y), "i 
P i 


el 
where R(y;) depends on higher powers of i di 

Y; than the second the ™ 
followed by Bradley (1955), if we redefine á gosse 


wk, Om y 
i e (iem lsd), 
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where 6... i 
ij 18 a sequence of 
converges to of constants converging to 6; a k i 
zero i ELS g to ô; as k—>00, it can b 7 
as 14 S42, This li m probability as Eco, and that T® has the s Sn shown that R(y;) 
zr s limiting distribution is, under H,, a non-central adi erae uoi 
pw ^ ] y2-distribution with (£— 1) 


degr 
ees of fr 
eedom and parameter of non-centrality 


A= 18003, 
which, for i 
, for large k, can be approximated by 


A= Hey (ni): 


Under the 

under H, e Hy m= Ub due 757 96 = 1, ...,¢). Therefore A = 0 and th 

Similarl is as a limiting central y2-distribution with (t— 1) degrees of freedom 

theses, "Hh can be shown that 79), T€ and T9 have, under their respective m hypo 

freedom, n arp y?-distributions with g(t- 1), -1 and (g—1)(t— 1) iens of 

obtained from t ively. Thus approximate significance levels for the indicated tests can be 
ables for the y?-distribution with appropriate numbers of degrees of ae 


For the - 8. EXTREME SETS OF RANK SUMS 
‘ situation i ; 
in Test I, the extreme values which the sum of ranks corresponding to the 


D 
obj 
Ject may have are 


EX 23a» or w (C= 1s): 

u 

he objects, equations in (6-4), which provide 
somewhat valueless. In 


Wh 
en eith 
er ; 
of these cases arises for any of t 
rameters, become 


maxi a 
order E m-likelihood estimates for the pa 
Or Sheets rank sum to be 2æv (or av), it is necessary that the ith object be judged inferior 
dure ig a. in all comparisons made by all judges- When these situations arise, the pro- 
Y the re estimate m; by 0 (or 1); then, drop this object from the analysis and consider 
ot " MANE ġ— 1 objects. 
consider the case where the 


x > Mju tju = 200 
ju 


as the extreme sum of ranks 


tth object h 


ced set of (t— 1) rank sums, 


Om; 
Mitti 
n ; , 
Namely. this object from consideration, We will have & redu 
Noy z X TaguTiju k, es E x Ms jul tu k. (81) 
we obtain the logarithm of the 


4 Vs fro 
like. m the symmetry of the paired comparison designs, 
of (6:1) to be 


ood E 5 
function Lo under the alternative hypothesis H, 


= t-1 id 
(In Zo | H5) =e [na 2- Y F Maigu"iju Inm-h patus 
; j=l u = 


i-1 
S maximum-likelihood 


t 
algo 
Sstj fo " * « 
* llows easily that the system of equations, which provides ma: 
1 will be the same as 1n (6-4) and (6:6), 


te, ates SE 
Spe Of m7, ..., 74, and the test statistic 


Ctiy, : 
ely, with ¢ replaced by t— 1. 
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i > described with 
Hence, the test procedure for (6-1) will be the same as that previously = il 
ire laced by ¢—1, and with the original set of rank sums replaced by the reduced se 
n =h i 
For the case where the ‘th object has the extreme rank sum 


2) E mngsrgs = «v, 
3 u 
8 
i i i ` "(t 1) rank sum 
the omission of this object from consideration will result in the reduced set of (t — 1) rank 
8-2) 
> ; on ( 
B Magui 2h, go DS Maju Magu 2k: 
ju j u 


. F : "ibed; 
Similarly, the test procedure for (6-1), in this case, is the same as that previously wn 
with ¢ replaced by (t— 1), and with the original set of rank sums replaced by the re 

set (8:2). 


In Test IV, for the case where the ith object has the extreme rank sum 


83) 

EE E Nijuu = agv or agv, ( 
mj u 

the test procedure for (6:24) is the same 


€ 1) 
as previously described, with ¢ replaced by (t 
and with the original set of rank sums r 


eplaced by 


š SEI „m er 
LUV gke, oa D D D Ma jut gu gk, 94) 
mju mju ( 
y m m A V UN ym m —2aghk 
or D D D niur hu — 29k, ee D E GE 2gk, 
mju mj u 


depending on whether or not 


For Tests III and V, if an 
namely, 


the tth object rank sum is 2agv or agv, ee rouP 
extreme rank sum occurs for the tth object of the mth £ 


P» Xn. — 24v or av, 
Ju 


the respective test 


e 
ich P 
procedures can be employed with ¢ replaced by (t— 1), and wit 
original set of rank 


sums replaced by the reduced set of rank sums, 


m jm ^ m jm d 
D E niuia k, Ue EXE, sk, 
ju ju 
m m " V Z DIM 
9r D E niuru — 2k, 2 MP iT ju A 
J 


jou 
Such an alteration to the test 
groups has an extreme rank su 
rank sum as in (8:3), in order t 


procedure can be extended w 
m. For Test V, 
o evaluate B® we will need to foll 
placed by t— l, and to employ the reduced 


of 
the 
hen more than one of er” 
if the tth object in all g groups has an qe od 
ow the procedure des 


4) 
set of rank sums of (8 
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by the j i 
a e L M = 1, ..., 6) will be dictated by the field plan of design (2), Table 1, given 
e (1956). The incidence matrices for this design are: E 


0.0 1 1 0 0 0 0 I 1 
0.00 1 lI 0 0 FOL 
N=f1 0 0 0 Tf, XM-2[9 1 0 190], 
1 1 0 0 0 10100 
0 1 10 0 ] i 9 0 9 
0 10 1 0 0 I 0 0 1 
i oi O0 6 10 01 90 
N=fo 1 0 0 1], N4[0 0 9 Lip (9-1) 
100 0 1 orf L 9 9 
0 0 1 I 0 1 100 
0 © ft 0 g oO Li 9 1 
r0 0 0 1 00110 
N,=[1 0 0 1 OF, X- 11000 
9 90 r 0 1 90 i 0 01 
5 Lor s&s y 9 9 1 9 


Now. g 

V, for j . ! 

where r judge u (w= 1,...,v), we construct a (tx!) preference matrix Ry = (isu Tegu)> 

^T . H H 
), respectively. The preference matrix R, will 


ha x 
sponding «à Pie as elements corresponding to the unit ewe si Me ree melt 
he rows of R roeng elements of N,. From this definition of R,, we o e ve = rayon 
Correspondi u yield a (t x 1) vector, the elements of which are the rank sums of the t obje i 
ing to judge u. For example, the preference matrix R, and the rank sums vector 


Corp 
espondi : à à 
bonding to judge J, for this experiment are 


iju = lor 2 if (a; >a, | u) or (a; a; | u 


0021 0 
0.00 I 1 
R,={]1 0 ? 0 2 
22000 
02100 
3 E niyi 
J 
2 Xonio 
j 
tal DLE 
J 
4 LU 
j 
3 DLTZUS 
Where li : d . By this procedure we obtain for 
Sach . 15 2 unit column vector of appropriate dimension. 25 : 


v 
Judge the following rank sums vectors: 
4: ($2549. 
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Summing these with respect to judges yields 
(SE nyu ajar X X Moja ju) = (15, 15, 19, 22, 19). (9:3) 
ju ju 
Hence (Qf,...,aE) — (9,9, 9,2,5), (9:4) 


where a* is defined in (5:4). Entering the tables of Bradley (1954b) for t = 5 and for k=’ 
complete repetitions, we obtain, corresponding to the rank sums (9-3), 


(Dy, ---» Ps) = (0°38, 0-38, 0-10, 0-03, 0-10), B® = 6-686, 
P{B® < 6-686 | Hy} = 0-0404. 
Hence, we would conclude, at the 0-0404 level of significance, that the five handwriting 


. H . t e 
specimens are different with respect to the characteristic x under the assumption that 1 
judges are consistent as a group. 


Tt is of interest to note, from (6-11), that 
TO = 2vrIn 2 - 2 BIn 10 = 10:80 4%). 


From the large-sample properties of 79, we obtain the approximate significance level b 
be P(x% > 10-80 | H,) = 0-028. Thus, although the approximate test will give to° E 
significant results, the approximate significance level obtained is reasonably close to” 
exact significance level, even for & as small as 3. +4508 
Example 2. Suppose that the six judges of Example 1 make two incomplete repetit 
with r pairs in each repetition. The r pairs compared by the uth judge in the first repeti 
are those indicated in (9:1), and, hence, the incidence matrices for the first repetition i 


N, =N, (u=1,...,6). 


Ther pairs compared by the uth 


e 
t . i i ont 
1 judge in the second repetition are indicated by the incid 
matrices 


Ni-Na (u-1.,5, N= M. 


. : i g2 
= rank = vectors corresponding to each judge for the first repetition are given 1? M 
y an ana ogous procedure to that used in Example 1, we compute the rank sums " 
corresponding to each judge on the second repetition: 
J: (2, 3, 2, 4, 4),) 
J: (3, 2, 3, 3, 4), p) 
J: (2, 3,3, 4,3), e 
Ja: (2, 3, 8, 3, 4), 
Js: (3, 2, 8, 4, 3), 
J: (2,2, 4, 4, 3),] 
Summing these with respect to judges, we obtain ) 
6 
2. 8 : 
D » Tj Tus RCNH x Db juju) = (14, 15, 18, 22, 21). 1) 
Hence i 


(ais, +++; 35) = (10, 9, 6,2, 3), 1 
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where a, is defined in (6:19). By summing the vectors in (9-6) and (9:3), we obtain 


(EXXaB,05-. DD Duly) = (29, 30, 37, 44, 40). (9-8) 
m j u mju 
ence, we obtain (al, ...,ad) = (19, 18, 11, 4, 8). (9-9) 


The result in (9-9) can be checked by adding vectors in (9-7) and (9-4). The elements of the 
vector (9-8) are the rank sums which enter into the statistic B® of (6-25). Here t = 5 and 
mplete repetitions—are outside the range of 
btain either the value of B® or its significance 
1 = 5, we can use a procedure outlined by 
aluating B®. Dividing gk and the 


= E 6—the corresponding number of co 
oe tables. Hence we are unable to o 
Cur directly from the tables. However, since 
DN (19545) and Terry et al. (1952) to assist in ev 
ements of (9-8) by two, we obtain 

jgk = 3, }(9'8) = (14-5, 15, 18°, 22, 20). (9-10) 
Then, for rank sums vectors 
9, 22, 20) and (15, 15, 18, 22, 20), 


(14, 15, 19, 
We enter the tables for ¢ = 5 and for three complete repetitions and obtain corresponding 


esti 
i (0-38, 0:38, 0-14, 0-03, 0-07), 


(0-51, 0-33, 0-08, 0-02, 0-05) and 
oximation to the estimates, 


ation, we obtain an appr 
9-10) to be 


025, 0-060), 


r " 
oy Then, by linear interpol 
Lb ++., Pg, corresponding to the elements of ( 
(0:445, 0-355, 0-110, 0: 


Whi i 
lich, after adjustment to add to 1, is 
(0-447, 0-357, 0-111, 0-025, 0-060). 


sing the iterative formula given by Bradley & 


Start; 
Ming with these approximations and u 
"Ty (1952), we obtain ; 
(py «5 23) = (0-441, 0:361, 0106, 0-029, 0-063). 
swale 
8 i ; " 2 12:5 
Ubstituting these values in (6:25), we obtain B® = 12:58 SUN. uini To iced 
‘elds 7 — 95.353, and since 7 has approximately a H) « 0-0005, approximately. 
Hon = 4 degrees of freedom it follows that PONE 25:353 | 4 rou and consistent from 
vaos, under the amospitien that the judges a7? ‘aa e Ms Tons level of significance 
* Tepetition to the next, we would conclude, a = s dleent with respect to the 
ci Proximatoly ), that the five handwriting specimens axo 
aracteristic Ay mple we were unwilling to assume judge 
ee in th revious ex^ ; that for Test III. 
*oasisteney Suppose pated ai next. The situation oir Cogit. N 
hug We hay rom one repeti P 6 judges comparing t objects e pid ani la dep 
= v= * 4 
| is with refer him ad characteristic, the judges 1» each & 
enc 
dereg the-same 
si : 
ing the data of Examples 1 22 
0-10, 0:03, 0-10); 


2, 0:03); 


6. This value substituted in (6-26) 


d 2 for each repetition separately, We obtain 
gp. edes, TP 10800 


(Dy, ..., ps) = (0:38, 038; TQ) 15:809. 


(Dy, ..., pg) = (0°51, 0:33, 


0-10, 0-0 gno 


[ 114 ] 


NON-NULL RANKING MODELS. I 


By C. L. MALLOWS 
University College, London 


1. INTRODUCTION AND SUMMARY E 
$ ^ ve demir pid 
Kendall (1950) has remarked that the major outstanding problem in ranking theory pe 
specification of a suitable population of ranks in non-null cases. Much attention has 
concentrated on situations which Daniels (1950, $5) calls of type (i): 


: : cu atio? 
"The sample is regarded as having been randomly chosen from a bivariate populi 
of ranks’ 


: i , r Jes 
the underlying population being either finite or infinite, e.g. bivariate Normal. Rather 
work has been done on Daniels's type (ii): 


: py the 
‘There is a fixed set of individuals being assessed by a population of judges, oF Oy A 

same judge in repeated trials, on a particular attribute whose ranking is known @ p on 

The random element is uncertainty of preference, the correlation being the result © 


differences between the individuals, and the population is one of rankings condition? 
a given objective order.’ 


B 
Daniels (1950), following Babington Smith, Thurstone and Mann, treats this problem, 2 
one of regression. The present approach is by way of paired-comparison theory. The u^ 
1s assumed to arrive at a ranking of n objects U,, U,, ..., U, by first making all the nC, o ar? 
parisons between pairs independently, but then only accepting the results if the 
consistent, with a ranking of the n objects. nbs 
Various non-null models are proposed. The general model (eqn. (1)) depends pr 
parameters—this number is then reduced to n—1 by using the Bradley- Terry od 
paired-comparison model (eqn. (4)). An alternative method of simplification is proP^ ú 


which makes the probability of putting (in paired-comparisons) any two objects “i ý 
in the correct order equal to 


4+łtanh (klog 0 +log d), w 
t 
where Ü and ¢ are parameters, and k — jii i oka 
, ; = j—i is the differ ranks ? j 
two objects. Thus this probability is a s uM vitrvtm Rarbena 


d imple monotonic function of a quantity ke 
ed m res linearly with k, and a constant term. The null HYPO” jp 
corresponds to 0 = $ = 1. It is found that 0 is associated wi sema ficient "yu? 
$9 with Kendall’s t, (eqn. (9 with Spearman's coeffic gh 


son 0. t 
)). Each of these parameters has a further interpretatio” y! 


i 

of this object being ranked in the various Possible positions: these probabilities i 
geometric progression, decreasing away from the true position (e t 8)) i l 
Putting ¢ = 1 in the general model ives a speci Tl -Terry a d 

g pecial case of the Bradley o 
(eqn. (11)). It is shown, however, that as ie 
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and ¢,. t " WE 
lr unen ice ina Normal form, the two coefficients cannot distinguish between 
dé (ai, ap pee i E therefore proposed to put 0 = 1 and to use only the one parameter 
Probability idioma : to exceptionally simple results, including an explicit form for the 
Dy of e - T function (p.g.£.) of t, and an invariance property of the probabilities 
depend sien a owa b jects U; and U; in the correct order in the ranking; p;; is found to 
wein, : - and à, and not on i or 2 ($9). 
lnc Pie ura: gare given; tests are derived which may be used to decide whether 
tote aenepalie i of the same objects are consistent with the same non-null hypothesis; 
differences wc d n en m judges, each of whom produces 7 rankings, we may test both for 
follows "idee ien judges and for inconsistencies within judges. The power of these tests 
Bees a ely from standard theory. 
pression is derived giving the conditional expectation of r, for given ty. 


2. THE GENERAL MODEL, EA 


In ¢ 

Ea a of paired-comparisons, ” objects U,, Us -=+ U,, whose a priori ranking is 

Smith em known, are ranked in pairs, there being "C; such comparisons in all. Babington 

assign to ee that a suitable non-null hypothesis for this situation would be to 
pair (i, 7) (i<j) the probability Ti; of ranking U; lower than Uj; this we shall 


denote by 
7; = PU; <U}} (1<i<j<™), 


independent. 

el, depending on 70, 
f£ "C, comparisons. 

onsistent, i.e. if there are no circular 
plex is equivalent to a ranking; 
of that object in 


ün 
> to assume that all the comparisons are 
coe generalnon-nullranking mod 
(ii) Tf E qus model is used to generate à set 0 
Tiads of arm resulting complex of comparisons 1S c 
9 each of e such as U; < U; <U; < 
e rankin 1e tiene U, may be assigned an 
g. Thus in the case n = 4, the comp’ 


OM UU, UU, UrUs 
hi. raking U,4 U, 4 U 4 Uo 
hich m 


parameters77;;,i8as follows: 


U; then the com) 
integer U; giving the position 
arisons 


U,4U, UA Uv UU, 


ay be expressed by 
u= 4, w= ly s 3, "47 2. 


«sy. 


We then h 


(ii 

(i) ig us the complex of comparisons is i 

? COMES ilii as often as necessary 
€ ab ng ranking is accepted. . ‘ 

Aot gu Ove is a possible way of generating rankings. 

Comp posted that any actual experiment wou d be perfor 


fo Parisong i 
fs r ^ can be made independently ( ae ‘Lis the production ofa model 
Or gy © Preferences into a ranking. at is here 


Ste ig ations where the observed data is & yanking, the difficulty Dome co ee 

trogi? longer independent (thus if UA nite Los have Uh) WER” 

We wing thi pendent ( PANY “tional distribution of paired-comparisons; 
ag, „© "Als dependence by considering & condi 


mit 2"C; possible outcomes. 
only a subset (containing ? 


tg < Ug < U3 
ble. The procedure 


obtained; and the 


ave 
ranking is possi 


nconsistent, nO : 
tent complex 1s 


until a consis 
mphasized that it is 
if the paired- 
there is little point in trying to 


| members) of the total 
8-2 


L 
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sott ill be 
The probability of obtaining any given ranking (u) from the above construction will 


-" in 
ortional to the probability under the Babington Smith model of the cor responding 
TO zs A 
pci et of comparisons. Thus the probability of the ranking 


(u) = (Uy, Ug, Ug, Uy) = (4, 1,3, 2) 
above is proportional to 


(1 — mp) (1 — 73) (1 — 744) Tog Toll — T34); 


putting Ai; = mul =ar) this hypo” 
we have that the probability of the general ranking (u) = (2, Us, ..., Un) under this di 
thesis is JA: P(u)-K, II Aga, 
1<i<j<n 
where K, is so chosen that the probabilities sum to 1; i.e. à; 
K3-YX JI As") = P(comparisons are consistent} JI  [7;(1 -mnglb 
(u) 1Si<j<n 


1l<i<j<n 
3. SPECIALIZATIONS ja 
. . P ec 
The above general model is rather cumbersome; we now consider possible ways de 
izing it so as to reduce the number of parameters. We shall attempt to find models 
simplify 
(i) the probabilities P{(u)} of the various rankings (u); 
(ii) the distribution problems connected with the two coefficients r, and ty; 
(ii) various other interesting and important quantities, such as 


Pij = P(u; <u, | all comparisons consistent}, 
which are complicated in the general case. 


4. Tur BRADLEY-TERRY MODEL 
In the paired- 


13 
et? 
comparison case, Bradley & Terry (1952) reduce the number of param 

from "C, to (effectively) n —1 by assuming that 


Ti; = P{U, <U} = jl(mi +m) (L<i<j<n) 
for some non-negative numbers Tis Tos sses Mp. This gives 


(3) 
A = mim; 
and for the proposed ranking model, we have d 
Hy: Piu} TT (ar,[77,)ESERyj—ud) — ID ziYemese)g qp m 
1<i<j<n l<i<n ' 1<i<n 
This is a term in the expansion of the permanent 
+ + 
m m "| (g 
Ug TE sas ae | gt 
š å = [oi li j71,2,...,n 
| T, us Th | yi 
c 
We note that this modelis closely associated wi " 


n a 
th what we may calla ‘generalized m 
coefficient’, this being taken to mean a coefficient, 

Au) = x 


Lo iwp 
1<i<n 
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Where the (o 
a;;\ can take any và ;jsoev : i 
of this fore t i i y values whatsoever. Examples of measures which are essentially 
d Number of matches; put æ;; = à? (Kronecker ô). 
E. Spearman's r,; put a; = (i—j)*. 
us Spearman's ‘footrule’; put g; = | i—j |- 
Bu m which may be considered are: 
i») ssh of matches + near misses; put æ; = 0j! - 014-0]. 
in ihe Mimi form of (iv); put a; = 611 - Kol - 0], where, for example, K — 2. 
Alu) i cases, the various measures, C(u) say, are equal either to the corresponding 
etly, or to a simple function of A(w); we have 


C(u) = F(A(u) for all (u), (6) 


Where a 
ditus Alu) = F-(C(u)) for all (x). (7) 


The scor 
core S(u) for Kendall's t, (see eqn. (8)) can be expressed in the form (6), but forn>4 


the inver : 
Cline function as in (7) does not exist (see Appendix I). 
P-g.f. of A(u) for the above model is simply 


» -— -——Á5 
fp: y P((u))z49 = ESAE 
Howe (u) 
ver, i 2 ; z 
, in the absence of simple methods of manipulating permanents, we shall not 


investi i 
on this model further at present. 
aui 0n & David (1956) have proposed an in 
matching distribution. 


ative hypothesis based on distorting the 


B. AN ALTERNATIVE METHOD 

f simplification of the model 244. The most general 
ability of each ranking separately, needing 
nC, by assuming à special structure for 
restrictive assumptions. Consider 


We noy 
Y : ; 
consider an alternative method o 


Don. 

! " ne possible would specify the prob 
le model — We have reduced this number to 
e followin can only be further reduced by even more 
Me ng additional assumption: — 

1 mption A. Rankings (u), giving a 
is ie the same values for both 7, and ty» have tl 

ssum umption is equivalent to A 
ption A’. The probability of any ranking 
Riu) = X, G-asen yt) 


1<i<j<n 


(when compared with the standard ranking 
ne same probability. 


, 


(u) is a function only of 


=2 X iu; — (r+ 1)? 
1<i<n m 
anq = in(n?— 1)7s 
S(u)- E S8 (u — ti) 


j«icjen 
C = n(n- 1) tre 
Onsid, 
*r also the followi tion: 
Ssumpr e following assump : . f "— ih i 
á : ; à „ses of one another have e same 
"Obability n B. Pairs of rankings which are inverse 
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Two rankings (u), (v) are said to be inverses (sometimes “conjugates ’) if aoe Mi =j 
then v; — i. Thus the rankings (u) = (3. 5, 2, 4, 1) and (v) = (5,3, 1, 4, 2) are inverses since 
u = 3, v = l; U= 5, 05 —2; etc. 
Tt is well known (see, for example, Kendall, 1955, p. 6 (not p. 11)) that for such a pair of 
rankings Ru) = Riv), S(u)- S(v), 
and so Assumption B is contained in Assumption A. 


. t u 3 
From Assumption B and eqn. (1) we may obtain certain relations between the par ameter 
25; thus for the pair of rankings above we obtain 


P((u)) = Krd A13 Ara A A3 Ani Ags! Aga Ags! Ais 
P((0)) = KA Ag Aa! Ag Ag Aa A Agi Ags Aas! 
and for these to be equal we must have 
Aia Aa p A341455- 
Proceeding thus for all pairs of inverses we obtain the following relations (for n = 5): 
Ayo = Aas = Àa = Às = AL say 
Ag = Àz = Ags = Ag say 
À = Às =À, say 
and we put Ag = Ag 
Further, Aids = AR, AzA, = AB. 
Similar relations (in fact subsets of the above) are obtained for n = 3, 4. These rela 


are generalized in Appendix II and show that Assumption B contains 
Assumption C. For any (fixed) n, we have 


tion? 


(i) Ay =A (L<i<j<n); 

(i) Aparna =A} (2<k<n-1). we 
Assumption C (ii) implies that the a) are in geometrical progression, i.e. for some 2 i 
have Ay = 0o, 
bà. P(U, AU, ) = —— Le = b-- danh (blog 4.1 

LEN EE g 0 + log 9). 
Hence 3(0,9): P(u)-K, II (AiG) sen; 
1<i<j<n (9) 
= K, 09800. 


" 1 of 
Assumption C thus contains Assumption A. We deduce that all three assumpti?” 
equivalent. We shall take (9) to be our basic non-null model, J/(0, à). 


6. PROPERTIES OF THE MODEL X (0, $) 
Certain properties of this model are immediate. 
(a) Let the null-hypothesis probability of a pair of values (R, S) be 


F(R, 8) = (nl) F(R, S), 
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hat there are fis TM o 
aere are just F(R. S) different rankings (u) having R(u) = R, S(u) S. Then fi 

, = S8. Then for 


the pn 
present model we have 


10.9): Ph 4(R.S) cc OFÓS PR, S). 


Tf th 
e p.g.f. under the null hypothesis is 


Jy Mz, y) = y P(R, S) fy, 
then the p - 
p.g.f. for the present model is 
89 JK(0,Q): Mo,gl®¥) = I (20, yp) My. 9). (10) 
ne " , 
present model will be a special case of the one considered in $4 if (from eqn. (3)) 
Tin = O* GT: (l<i< i+k<n); 


this requires ø = 1 : 
— 1, and then we may take without loss of generality 


m= 09 (1«i&n) 


Then 
P(U, 4 U) = OR +0”) 
(11) 


E + + 

Ag with KH (0): P{(u)} - gaxini]| gï | oc OR, 
Y s 

(c) Tt E dean model is unmanageable 

istributj own (Daniels, 1944; Hoeffding, 1948) that under then 

on of and S tends to the bivariate Normal form; ie. that 


9f poin 
t di. 
8 (R, S) containing « such points we have 


xz B5 I 1 f(R, S) URS, 
(R, S) e. a » 
ng A, but no points of Z; the factor 4 


Normal distribution function with 


permanents involved. 
ull hypothesis, the joint 
for any suitable] set 


because of the 


a 4a containi 
ariate 


Where S4 i 
sa vex regi are: 
convex region of ar 


enters .. 
since AR = AS = 2. f(R, S) is the biv 
2(n+1) ~i- l 


n2(n— 1) (n+ 1), og Aen MS E 
n 


Hg = 0, oR = E = 
(2n+5 d 
denn - 1) 9 Rupe » 


To Hg =0, oh = 
T tj , ‘ , 
he model % (0, 9) we have approximately (for of small) 


v» P, (BS ~ OF sff J 
W (R, S)es aol ) ? d 
aches to the limi 


ence 
i a! : 
Wariat Symptotically, under suitable appro 
9 Normal with 


&(R) e og (10g Ó-* 07s 
EIS) « os (pa log? * "s log 2). 
tribution tends to 


Usual ive dis 

ed "teas demonstrate only that the curia nd to the Norm: 

bus © result a that the ordinates, ‘averaged over & Fat) has proved the resul 

Sog; PR of S. ie æ = l has not been proved: aden y viour 0 the bivariate result is more di 
» but in view of the well-known erratic e oak need to consider sets with 


e 
Ure i 7 
it to be true; however, for the present W 


R, S) 4 R4S, 
t for 0 and ¢, R and S are 


We 
dail p(B, S) =P- 
var S 208: 


log 9) 
the Normal form; what 
al ordinates. I believe 
t for the marginal dis- 


ig | The 
ifficult. 


th, Mir 
à 
tet 

l 


a= Q(n-0n03) = O(n"): 
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rT : . ic 
Conditions under which this is the correct limiting form are as follows. The asymptot 
result in the null case may be written 


Ro [So 
im | Z na, 5 | f(R,S)aras| = " 
mox RSR, m——— 


where, as n—> 00, Roler = O(1, Solos = O(1), 


Fijo — Solos = O(n). 
This last condition follows from the approach of p to 1. (It is possible that weaker con- 


5 -null 
ditions (e.g. Ro/op = O(n?)) may suffice; however, this is not known.) For the non nu 
asymptotic result to hold 


E(R)/opn = O(1,  &(S)]os = O(1), 
E(R)lon—6(S)|oy = O(n), 


whence Tplog? = O(1, oglog¢ = O(1), 
ie. logü = O(n-), logó = O(n-3). 
Putting lime ,logü =p’, limeglogó = 1", 


we have limé()/o p = pe’ +p" = lim &(S)Jos. 
aly 

Thus the two parameters are asymptotically indistinguishable; each of them merely ; 
shifts the bivariate distribution in the direction R|o s. = Slog. We deduce further that 
and 5 are asymptotically equivalent in the model J£ (0, à), i.e. they have asymptotic? j 
equal power for detecting a change in (log 0 + og log 9), at least whenever the above 
the correct limiting form. 

(d) From the form of eqn. (9), R(u) and S(w) are respectively ( 
0 and ¢; they will therefore provide most-efficient estimators w 
joint distribution is Normal. This brings out a distinction betw: 
is a pure ranking model) and other models which hav 
n is drawn from a bivariate Normal 
are replaced by ranks, then Pitman’ 
estimating p ( 


r 
65 icient f 
and jointly) sufficie! ü 
y mpte i 


henever their as aich 


een the present model (wt 
€ been suggested. Thus if a samP - 
population with correlation p, and the variate val i 
s asymptotic relative efficiency (A.n.g.) of £ Sd d 
relative to the product-moment correlation) is known to be 97? (Hotel m 
& Pabst, 1936; Moran, 1951). Lehmann (1953) considers a certain non-parametric sy? e 
of alternatives to independence, and shows that R gives the optimum test for thes? ^ dg? 


natives, for all sample sizes. For the Thurstone-Babington Smith model, where ? a ooh 
n estimate of the ‘value’ of each ° : n 


E 
uart (1954) has shown that the 4 Ape 


: > T 
i "(mm cient) is 3-1, It is hoped in ® » tet 
communication to investigate the relations between the present models and these 
natives. 


7. THE MODEL KACA 

We shall no t i i : 
ua rate on the parameter o. Putting 0 = 1 in the basic model (9) 9) 
H (o): P{(u)} = K, pS 
and PIU AUS = 7 = $i +g) 


(l<i<j<n). 
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Writi . 
ae now (tn ) for (u) to denote the dependence on n, the p.g.f. of S, = S(w,,) under the 
hypothesis is known to be (see, for example, Moran, 1950) 


1 
M,(z) = X Pif{(u, j= IL z (gE71 4-279 +... E217), (13) 


Qua) 1<k<n © 


whene 
e n. " nt 
the random variable S, can be expressed as the sum of n independent random 


vari: j 
ariables s, with p.gf.’s 
EC zk k ^ sinka 


z — ksing’ 


1 
m,(2) = i 


Where 2 — pi. 
€ z = eit, Thus s, has a uniform distribution on the k points s, = k— 1, 5—3, ..., 1-k. 


Hen 
ce Moran (1950) obtains the cumulant generating function (c.g.f.) of s, as 


" d (ic)? B 92m—1 " 
i W S LL] ee z — km), TA 
log mj.(e'7) =, aml (-1) T (1— £m) (14) 


T 
Or the present model we have 
M, 46) = M, (G2) (Q) 
Dg ERE as) 
wl B idis gz- ole 9-97 
v H . 
eis € S, can again be expressed as the sum of n independent random variables sp, with 


P.f, 
Thus 5 mpg) = my(bz)/mx(P)- . um 
sende has à distribution on the & points b-1,b-98,..,1 —k with probabilities m 
Progression, proportional to $^. p 
Putti K; = M,(¢)- (16) 
8 log ø = 8 we have for the c.g.f. of s; 
sinh (kia + k6) sinh ô, 
log My, ee) = log "sinh (ix +0) sit "VE 


+g 


Nee, rigs 
* Writing By = coth kd = 


We have 
Kig — kf — f 


= (ft-1)- EUR D 


Kok 
2 P 2 .]), 
Kap = 2E BE — 1) - 2/.(5i ) -— 
l a TE 208 OPE | 
is i case is obtained when Loo with à 


Th 
ese re 
Const, duce to Moran's values when ô—> 0. An extreme 


an 
th lf l4 26-?* and we find 
Kyk^ k—fit ang, 

ge gs 
yı >- 2p Pi- 1) d "a "d i 
» 2 ]y-2Z 1 
The - y,» 2850-005 - D 2: 
“Sponding limiting distribution of 8,38 pem | 5.1,2) 5) 

year Ue 
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Thus even in this case, the central limit theorem will apply; asymptotically. 

(S, — 3n(n +1) nf) (ntf - 1) 
is a unit Normal variable. We notice that this is essentially a different limit from that dor, 
sidered in § 6; it is not known whether the corresponding limits for R(u,,) and for the bivariate 
distribution are Normal, or what happens to the correlation between R, and S,. In inter 


ts distribution 
mediate cases, when log 9 — 0 slower than n-!, we would expect the limiting distributio 


of S, to be Normal, with — (e). 652). varS, = o(n). 


8. DECOMPOSITION OF Sp 
In the null case, Feller (1945) has shown that the variables s, may be taken as 
S= X sgn(u—uj) 
1<i<k-1 


this representation may be carried over to the ¢-model case. (This contradicts & remar“ 
of Moran’s (1950).) Let us go back to the original paired-comparison approach. Supp” 
the judge has decided the "-1C, preferences among U,, U,, ..., U, , and has as yet no inco? 
sistencies. Then we have an (n — 1)-ranking (w,,_,) say; and the p.g.f. of S(w,_1) 18 M, uit 
Now the judge makes the final set of n— 1 comparisons, U; 4 or > U, for à = 1, 2; Bw 
"These comparisons are independent amongst themselves and of the preferences alre® y 


decided: 
ecide P(U4U,)--" (l<i<n-1). 


Of the 2"-! possible sets of outcomes for these comparisons, only these sets will be M. 
sistent which insert U, into one of the n intervals between the » — 1 U's already r8? a 
the probability that U, is ranked higher than the lower j — 1 U's and lower than the nigh 
n—j U'si 
uic: m(n) (1«j«m) 
independently of the ranking (u,,_,). 
This set of comparisons gives a ranking (u,) with 


S(u,) = S(w, 4) F 2j les n= S(Un—1) + $n; 
say. The probability that the comparisons are consistent is thus 


5 ni] —n)r-i = (7 — (1 —m) (2r — 1)1 Lg 


1<j<n n say; 
dwe have as bef i PE it? 
and we have as before that s, is a random variable independent of S. (4, .,) (and of (un) 
n—1l 
P(s,) = C iso (1 — r)i- n $7 07 us) 


Sagat” (en = B—1—m; eje" 


9. AN INVARIANCE PROPERTY 


We now prove for the ¢-model an important invariance property, namely, that 


Rn . 
p» = P(u;«u;|n,$, comparisons are consistent) 
depends only on j — i and ¢, and not on i or n. We prove that, 


(n) = py i = 
)9 1= pm for l<i<t m En 
(m + l< . 
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ad will be proved by induction on n. It is true by definition for 2 — m; assume that it has 
een proved for n = m, m 4- 1, ...,a. Fix i, with 1<i<a@—m-+1. Consider the rankings (u,) 
a 


wit! : 
vith S (ua) = S and having the property (property D) that t; < 1;,,, .,; we jus X dedu 
Quo). S 


iene over these rankings. We shall write ((u,) j) for a ranking (u,,,) which has 
a41 = j. We have by the inductive hypothesis 


pm = pi = b» »3 P{(u,)}- 
S 


1 ii-m-— 
i Á (uIL, 
Also 
(a+) ` -] , a 1 
BaS X Pld) E Ea uns 
S (1a 01, S S 1€j£atl (na), S 
asi 


Pus ((Q4,)) = S(u,) + 3j — 2—a, and ((u,)j) has the property L if and only if (u4) has; 
nce 


peaks EX  P((Qu) 
S 1«j£a41 (uL. S 
aK, 5 ores wt 
1<j<atl S (ta)|L,S 
a41.. g-a-l 
r—1p(m) — ) 
= Kou ó a Ka 7» = Pim 


)- m4 1; this case can be proved similarly 


1) 


This pr 
Us proves the result for all i except i = (4+1 
awe have finally pf? = Pr 


a E onaidering rankings (2q41) = ((u4))- Hence by induction on 
Si<i+k<n), where the {pp} depend only on eB l | 
a © may obtain the {p,} explicitly as follows: we write (i(wx—2) j) for a ranking (ux) which 
S w= 4, Uy, = He Then . bi 

S(ilur-2)J) = S(ug-2) + Aj—-i) +sgn (i—3) 
B" H H 

=p = x P(iu-23) 

Mya us velie (uk- 2) 

= by 2j-0-1 Y pSttr-2) 

E "m um 


= Ky (doi - bj (6-0 079 (9-97 Kho 


Whe: 
nce for any i (1<i<i+k<n) 


yx = E(sgn (uj. 7 i)) = 2p,-1 


kpdck 
EH pe git = k+1) fi. -kpr (19) 
i = t+ ER dr 
in 
the notation of $7. 
om this result we have — 
pm n— ke 
ES) =g 2X sg (uj — t) = -- Vit aet 
1<i<j<n = hp Bis 
£8)-48 37 E IT 007 
Na 1 s 
S'eement with $7. Similarly we may obtain & (E): T. 
siya 8 SIM uade pao 
à 1<i<j<n lee fpe 
= 3 i k(n— b) Yr = da ee i 
1<k<n-1 E (20) 
Eur 
1<k<n 
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10. INTERPRETATION OF THE PARAMETERS 


Having now considered some of the consequences of the Assumptions of §5, we pon wm 
interpretations of the two parameters 0 and ¢. Each of them corresponds to a cartam n 
of departure from the null hypothesis, that the judge cannot detect the real differer 
between the objects: "- 

(a) The model #(0), by its association with the Bradley-Terry model, may be consi 24 
to be assigning a weight w(i) = 0?! to each of the objects U; to be ranked; this weight rema £ 
constant throughout the process of making the paired-comparisons and arriving at t 
ranking. The paired-comparisons are made according to the rule 


"UE o ee E a 
PISO} = wli)+w(j) 0909 T OEO? 
e 
where k = j—i for 1<i<j<n. Thus this probability depends on the difference between = 
a priori ranks of the objects, but not on their absolute positions. While this seems * 


: edge reife : : sets 0 
attractive hypothesis, it must be remembered that the rejection of inconsistent sets 
comparisons introduces a distortion, the effect of which it is difficult to assess. 


Table 1. (6): Values of p, = P{u; < Wig) 


2 
hs L2 1:5 2-0 
" 0-54 0-6 0-6 
1 0-5455 0-6000 | 0-6667 
2 0:5754 | 0-6632 0-7619 
3 0:6049 0-7215 0:8386 
4 0-6337 | 0-7737 0-8946 
5 0-6617 0-8191 0:9339 
6 0-6887 0-8577 0-9599 
7 0:7145 0:8897 | 0-9763 
8 0:7392 0:9155 0:9862 
9 0:7626 0-9361 | 0:9922 


(b) The model #(¢) puts for the paired-comparisons 


$ . 21) 
PU4U)- 7-3. («icjen, 


. ER * av 
this probability being independent of both i and j. This may not seem as reasonable ed 
assumption as that for the 0-model above ; however, the quantities we are really intere 

in are the p; of eqn. (2). 


Table 1 gives the values of 
Pr = Piusw) (1<i<ithk<n) 
(from (19)) for three values of ?*, namely; 1-2, 1-5 and 2-0 corresponding to 
m= $= 0-54, 0-6, 2. 


It was proved in $9 that p, is independent of i and n. 
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. Tt will be seen that these values progress in a very reasonable manner—the further apart 
in the true ranking two objects are, the more probably will the judge put them in the 
Correct order. 


metrically as j decreases to 1. 
(c) The general model JZ(0. à) (eqn. (9)) puts 
P(U, 3U} = peg ss = tan (klog 8 +log 4) 
iS = ppo tpa t 

"i Sexj—1, Let <j<n. This is a simple monotonic function of klogÓ--log ¢; thus the 
i of and ¢ are in a sense additive. The term log ¢ gives a departure from the null 

YPothesis which is the same for all pairs (i, J); the other term adds to this a departure 
depending linearly on k = j —7. This model would appear to be an attractive approximation 


t 
© the general model JE, (eqn. (1)). 


1l. APPLICATIONS OF THE Q-MODEL 


Mise the form of the model JZ() (eqn. (12)) we have that S(w) is sufficient for à n I 
of estimation and of hypothesis-testing will therefore be most nai P dina me e 
: However, since R and S are asymptotically equivalent, any procedure using y 


“sited be replaced (approximately) by one using È. 


e shall use the following notation: 
à-logó, 4,-4(8 |29) = var (S | 2 (9). 
— 1) (2n 4- 5), 
Amd, Ron verts 20) = denn DORT 


Whi i 

hich though not entirely consistent, will not lead to oe expansion of log M$) 

legn Protimations. From Moran’s expression pu TO a 
* (13)) in powers of à: 

ó S ga-l.e 

log M,(¢) = 5 È (#-1)- 150 22 

ant +3lnt3l) e+. 


, n(n—1) (ons 4 
=m- “500 


n(n— 1) (as 21n24-31n.-- 31) E+. 
65 = PM) g Zigan- 135 
Q. M log sí 
z M, (o) dj 32 Gn? + 2102+ 3in+31 " Ji 
Kags d -w--— m5 
Bet 2 log M,(9) = Al 25 
W, ua log M,(¢)+¢ do? og 2 


? not; 
tice the curious result d g 
5- ad e 
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We obtain further 


n n? 
i 27 6% 
Hence VV —- 35e 


: ; Hg " ; of 
Table 2 gives values of the percentage error of the approximation V; for various values 9 

n and ġ, with the corresponding values of A. It will be seen that the approximation a 
remarkably good even for small 2, provided A is not too large; except for very small n, t 


approximation improves with increasing n if A is held constant (compare the three entries 
with A = 1-72 or 1-73). 


Table 2. (¢): Percentage error of the approximation Vs 
with corresponding values of A= &,V 54 
96 error = 100(V,— V4) V5! 


gall ga 12 pea 15 $20 
n I ~ x 
96 error ^ 96 error A % error | A o% error ^ 
= — = f a Y E = i p — m i 
| 5 
2 | +011 | 005 | +038 | oo | +192 | o20 | + 575 | 997 
4 +0-08 0-14 031 0-27 +131 0-61 + 315 
6 | +005 0-25 047 0-49 +0-08 1-12 — 4:09 2:0 
8 + 0-02 0-39 — 0:02 0-75 —92.35 1:73 — 19-28 31 
10 —0-02 0-54 — 0-30 1:04 —6:03 | 244 
12 —0-07 0-70 -0-71 1:36 
14 —014 0-88 — 1:32 1-72 | 
16 —0-23 | 1.07 —2-16 211 
18 —034 | 128 
20 — 0-49 1:50 
22 —067 1-73 
24 — 0-90 1:97 | 
26 | -118 2.93 | 
| 
, à | e 
Under ($), S is approximately Normally distributed with mean &, and variant or? 
for small departures from the null hypothesis, the variance is approximately Yy or 
crudely still, . ot 


In the following it should be remembered that we are investigating the judges, and n 
the underlying ranking. This was given a priori; each judge makes repeated attempt? 
reproduce it; we are concerned with 

(a) estimating the degree to which a judge can rep: 
he operates according to the ¢-model) estimating his ø 

(6) comparing two attempts at ranking; 

(c) testing for differences between the attempts of several judges. 8i 

Estimation. Suppose a judge has produced 1 different rankings of the same ? objec" Í 


m 
roduce the ranking, i.e. (assu 


2 


jb 
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kelihood. We have 


all do this by the method of maximum li 


is desired to estimate his ø. We sh 
t 
mL (0j = log 9 BI S(u); -l log AM), 


from eqns. (12) and (16) 
log Pi). (u)a - 


w 
hence the estimation equation is 
d — ] 
dà log M,(9)|s-3 = 63 -6(8| à): 
or small departures from %, we have 


S=¢ 

thus . 
the model is fitted by its first moment. F 
Sisapproximately 


ST, logó = Vô. 
al for 9; since under 2 (9). S 


Spproximately 
We Mav oj 
id cn approximate confidence interv 

al with mean &, and variance V; |l, we have approximately 
PIS—A, Vol E< S ANOGID | 32: 1— 22. 
ormal distribution; more € 

^ 
gre Pll) AELA 
(22) 


rudely 


Wher x 
re A, is the 1002 9/, point of the unit N 
P(8— A, (Pld) «Vlog ó < 5 
27 qa 
produced rankings 


Where g i 
y ó = h yet 95 E 
ro j have 
dges. Suppose two judges 
d with respective parameters d, and ġe Since S(u)ı 
, th means V, log 41 and Vlog ġa» and 


T 
pid consistency between two ji 
and 8 (ug), according to the ¢-mod Á 
is are approximately N ormally distributed wit) 
iances V, we may test whether d; = gz by referring 
(Su) — SNC N 
gh by Vj, from ( 


With var 


22); i.e. we refer 


to N | 
ormal tables. An improved test is obtained by replacin 
e S(w | 
(Su) — sw. fe — Bon (Su) + S(u)g) 
celihood ratio test. | 
à hie me judges have each produced lrankings 
We associate with each judge a value ¢; of 
= Pm the ‘between judges sum 


do sue 


to 
Normal tables. 
1s . 
Test "Sen is asymptotically equivalent t 
"di consistency between several judges- S 
res S, (j21,2,..,l5 i= 52 Lm). 
thesis 91 = 


ar. 
ameter ø; then under the hypo 
? 5, - 8.7 
puted asToXn-v and the ‘within 


t 
e 
P 
of 
» qu ares’ 
i=l 
ximately distri 


With the a 
ges g sual notation for means) is appro 

um of squares’ "E 3 

i xw m 

A all i between judges 
roxi t for differences judg 

s boty distributed as Vo Xmt-»" Thus ee s eel replace V, by the estimated 

Consistency within judges- More accurate 


$ ay 
S efore, 
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, : à andar’ 

The approximate powers of the above tests are immediately obtainable from standar 
Normal theory. Thus, for example, the power of the test to detect a difference between Qi 
and ġ, when a two-tailed test is used is approximately 


1- FA, —- n) F(-À,— i), 


where A, is as before the 1002 % Normal point, and 


L = VM) (log ġ, — log d). 


12. THE CONDITIONAL EXPECTATION or R 
y 1 E e 
We now obtain an expression giving the conditional expectation of R, given S; this 18 "| 
same for the ¢-model as for the null case. Writing as in $6 Mx, y) for the null-hypothes 


p.g.f. of R and S, Myx,y) = X w®ySP(R, 8), 


RS 


the p.g.f. for the $-model is M(x, y9)/ M,(1, 9). Thus 


_ 8 Mix, d) (23) 
540) = 3 ars Bloat 


However, we have for the null hypothesis 


9 " 

à; ^^; y)-i- LPS) éR |S), 
and so from (20) and (23) 
XB) ER |S) = E (R) My, p) 


=(n) Y (2k-n-1) EAS $$ g- 


2<k<n E 9-97 1ien $— 07 
= (nly X Gk-»-0)(E-1)993( 39-24. L(1-1)977 


ze 
x J] {pigi +9 
2<i<n 
iek 

13. CONCLUSION 
The ¢-model exhibits several features of simplicity; 
countered in attempting to evaluate such quantities 
knowledge of the latter quantity would enable the vari 
to be obtained; hence also the conditional variance of 

It is hoped in a further communication to present: 

(a) an investigation of the result of 812; 


e 

(b) a comparison of the present distributions of R and S with those obtained fro? oth 
models, e.g. sampling from a bivariate Normal distribution; 

(c) an extension of the method to cover those cages where t 
a priori. 


y 
however, difficulty has been ^. 
as Plu, = j} and Plu, = j, "^. 
ance of R and covariance of 
R, given S. 


and 


f 


"e kno 
he true ranking is not 
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— APPENDIX I 
nstrat i re i i i 
ition that Kendall's t; is not essentially a ‘generalized matching coefficient’ 


To Satish 
Sty eqn. (6) wi ' ] 
qn. (6) with C(u) = S(u) (the score for Kendall's £j), we could take (for n <9) 
gym j 10 
e.g. A(3, 5,2,4,1) = 35,241. We then define 
between 12,345 and 54,321) and have (6). 
relation (7) for all rankings (u) without 


when 4 

A(u) is si T 

the function xia the decimal number with digits &;; 
wt) suitably at the z! different values of x (e.g. 


owever 
rever, no matter w. 
having RR. Vies {æi} are, we cannot satisfy the inverse 
= ` values PGS) = x mili: " F 
1=1:; ues F-1(S) = gs equal for different S. Thus with n = 4, consider the rankings with 
A(u) 


S(u) F-\S(u)) 


yy cog + Aag + Mss 


6 Ie = 

4 Is = yy ios + Mga Oa 

4 Un x yy H Aog + Aga Qa 

2 [5 = yy + 63 + Aga H a 
Us 3) 2 Jo = hy +H + s + Ms 
(1, 4,3, 2 0 Jo = dyr + Moy + Mag H Aae 


Hene 

e fi A 
or any {2,;} we must have 

ga— 202 292 0 = 9- 

= 2,3, 4 in turn; together, they 


Th 
; nore are throe 
from the rankings with wy 


im ima " 
Ply that Io = similar equations obtainable 


g-e- A fortiori, (7) is impossible for n> 4. 


APPENDIX Il 
C we shall ex 
the cycle ( 


of inverse rankings. 


hibit various pairs 
2,1) or (6,2, 1, 5) or 


T 
9 prove t} 
1,5, 6, 2) (or (5, 6, 


* hat Ass š 

(2, Shall use NAR Ser B contains Assumption 

*5,6)) will rep cycle? notation; thus with n = 6, 
represent the ranking 

=o ma iy a> 3, w = 4 


uy = 5, Us = 6, Us 


ed by the cycle (2, 6, 5,1), 
t equivalent to itself will 
quired relations 


5) is represent 
xion is nO 
btain as in $5 there 


2,0,3, 4,1, 


le 
* the 
Whig, king (5 
ich į mg (5 ; ite ; 
(5, 1,3, 4,6, 2). The inverse ranking ( a 
whose rette: 


Is th 
Thus any cycle 


Speci Ne re ‘ 
be Fou of the first cyclo. 
en the A p rankings. From the following cycles We now Oo 
u's 
Cycl 
To Relation 
Prave eii) 
(2,3, : Aia = Avs 
. Ags = \ . 
(1,2,3 Shes olia Hence aaa (1«i«n-1) 
(3.4 5) Anis = Ans we. Aas = Aza 
(4,5) Assay = Assis ie, An = Às d 
0,2,3 ete. : nli Hence Aia = ^s (1«i«n-2) 
73,4, 5) AyeAisArs = Aes Ass Aas i.e. Aas ae eo 
ete. i n Hence Nis = 4 (1<ts ae 
T 3hdas tn, Hence Anusk = Ay (PSs +k<n) 
i Prove C i): 
Fs l i =e 
L4 " 2) AisAog Acs = Ards re. AjAs = “8 
KE Ea A.M 
B diis 2X Q«hen-l 
ete. at á Hence Apae = A (2<k<n ) 


Biom. 44 


| 
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THE GENERALIZATION OF PROBIT ANALYSIS TO THE 
CASE OF MULTIPLE RESPONSES 


By J. AITCHISON ax» S. D. SILVEY 
University of Glasgow 


1. INTRODUCTION 


The ; 
in Som ig probis analysis which we will discuss in this paper arose from a problem 
more gener: T We will suggest a solution to this particular problem and then consider à 

The "€ situation where this method of solution might be appropriate. 

semi : ‘icular problem is as follows. In the course of its lifetime the Petrobius Leach 
" E à, Machilidae), which is a primitive wingless insect allied to the domestic silver- 
lola through various stages, technically referred to as ‘instars’. A problem of 
Hise Sin pid is to estimate the mean time spent by such insects in each instar (stage). 
consequent] ure difficulties involved in keeping these insects alive in the laboratory (and 
time) th itly in keeping them under effectively continuous observation during their life- 
accordi, a ‘Were sampled at various dates in the field, and those observed were classified 
Started ng to instar. The actual set of data to be analysed is given m Table 1.* The insects 
hatching on 30 April and times will be measured from this date. The problem is to 


estim 
ate fri i 5 
from these data the mean time spent m each stage. 


Table 1. Numbers of insects observed in various stages 


Stage 
iii Total 
1 2 3 4 5 6 
3 = : ls : 
29 May 31 2 à ð o o s 
T 18 e k ; 0 0 150 
ay 18 90 38 4 0 130 
26 June 0 31 A 23 1 k us 
ae} Se 0 2 12 65 
a 2. Basic MATHEMATICAL MODEL 
P a i i i i l 
s us he Purposes of a mathematical model of the situation outlined A ps oap m 
“ii nt of an insect, 
> n in the developme 
: ere are s+1 stages In m 
by ia is necessarily in Red of rtm stages. [ne final ($+ 1)th stage is cma a 
' i rs fo f 
E is i -. ing any unknown paramete 
We cts and there is no question of CAM Fi n Seth em ERA, 
re m 


SO ; 

& l Suppose that observations ® 
> 

* (em). E 

" T 1957). Stage 1in Table 1 consists 

echnique D Deed to be difficulty in distin- 


f 4 T ej i 
hatch; ntomological details and sampling t tages, S! 


Buig Ching 4; 
shi, "E time together with three natural 8 


a 
mong these. m 
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y an m: vtn sta; i= 2 i as an observa 
The time spent b insect in the ith st ge (i 1,35 5 s) can be regarded 18 6 u 
ti n a non-neg i i £s nd our problem is to estimate the mea 
10N O! on-negative random variable a e ean value 


1 7 D icr 5 ^ lasan 
A, of each E Also the total time spent by an insect in stages k; 2, xe .,F can be rc gar ded 
i Y : v sy rly we 
i M r= 1,2,...,s), then clearly 
observation on a random ariable UP à £. If Hr E(7,) (r ) 


m: arron 
can estimate A; (i = 1,2,...,s) by estimating ji, (r = 1, 2, ...,s) and then taking differ 

i 
of these estimates. 


Tf, for each r, the distribution function G, of 7, is continuous, so that 
Pr(y,<2x) = Pr(y,<2) = G(x), 
then the probability z;() that at time x an insect chosen at random will be in the ith stage 
is given by n(x) = 1—,(), 
m(x) = G; (v) G(x) (i = 2,3, — à 
Ts aa(7) = G(x). e 
The expressions given for m(x) and 7,,,(z) are obvious. For i = 2, 3,...,5, we have, n 
usual notation, ss) e e (os iie] 
= Pr(y, ,«x)— Pr Qa «2,9; <x) 
= Pr(y; , «2)— Pr (9; <2), 
since the distribution functions of the 7;8 


These expressions for m(x) (i = 1,2,...,5-- 1) enable us to express the likelihood o 
observed data explicitly in terms of the distribution functions of the Nps. , j8 

We now assume that, for i = 1, 2,...,8, the distribution of the random variable £i f 
approximately normal w 


: Iso ? 
ith standard deviation 7;, small relative to A;. Then 7, als 


zo aed 
approximately normally distributed and we may take G(x) = p(z) = Jm [ 4 
i ii ic 
Th; b l ar 
- This assumption completes the ? 


R : cu 
are continuous and since 9; <x implies 7;-1 


f tbe 


xp 
where z, = 7 r 


r 


and 6? = var (y,) = var ( Y &) 
ne 
model, ] 


jt? 
go 
a ji 

hs 
standard deviation is small relative to its me. 
say, 30, <A,. It must be emphasized that in other applications this condition may not e i 
ome other form for the distributions of the *' 


negative random variables. 


D 
' 


3. TEST or THIS MODEL be 
In the above model, the only 


: PAN i 
real assumption made ig that of approximate normality ° 40 
g,'s. At this point a very easily applied test of this assumption is available. For su)? 
the assumption we have 
? 

È m(x) = 1~@(z,) = (—2) = ot) 

ici 8 p 
and so Hc ( < 


2 2(e) i l 
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No " 
n,; denote: er 1 $ t Y g ib time 2 is the to d 
vif ai notes the numb 
> er of insec 
s observed in stage va 1e Vgs AN. tal number 


Na; then p,; esti 7 Y x 
ai p,estimatesz;(v,)and X p,; estimates X 77;(2,) 
; ; SM EET 


Observ. g 
edat tim 
er, i uw 
, and N,p,; = 
vet i-1 


Hence if Y, = e Sp ) the equival 5 
& Pai}, thee uivalent rma! i v 
didum m ^ > q ent normal deviate of Y Pa; then, for fixed r, the 
Tys = fo z 
and beiden F 1, 2, ..., m) will be well fitted by a straight line. By plotting these point: 
g for each r their nearness to linearity we can decide whether or = ihe 


assumpti 
ion of Powered 
of normality is justified and the model adequate. 


i=] 


Key 
FE 
| 2 Symbol 

a 

E] 1 e 
= 3 2 x 
E a 3 fo) 
2 $ 4 + 
22 5 B 
E o 

I 

E o 

5 

c 
ul 


"n r 
Fig. 1. Equivalent normal doviate of 3 Pa; plotted against time. 
i=l 


ven in Fig. 1. In considering this diagram it 


The a 
> dia ^, 

igram for the above set of data is giv 

, near 0 or 1, 


8 
tems Teasonab l . r 
able to give little weight to those points based on values of = Pai 
im 
nent of such points. 


Sine 
ause considerable mover 
asonable' point is 


Small | 
Naristianas 
ariations in numbers observed can € 
hich more than one ‘re 


wi 
A be see 
^vaila le ri that for those values of r for w 
» these points lie very near straight lines. 
We an 4. ESTIMATION 
e " 
ethog iH w in a position to estimate the unknown parameters /!; (i = 1,2, ...,8) by the 
+ *Presseq Maximum likelihood. For now the likelihood of the observed set of data can be 
hile th explicitly in terms of the 2s unknown parameters Hi and 0; (= l, 3, ...,8). 
ate © 2s maximum-likelihood equations for their estimates are readily derived they 
ative procedure. 


y by an iter 

ations is tantamount to the fitting 
Y, e= 1, 2, ..., m). Initial approxi- 
be obtained by fitting straight lines 
fitted thus to the points (ta: Yar) 


Ot sir 
of” oe and have to be solved numericall 
st», ark at this point that the solution of these equ 


r 
tar ght li ger 
ati lines to each of the sets of points (Ye 
ates can 


Yo, ODS f 
"hl. b the maximum-likelihood estim à i 
these sets of points. For given ra straight line 
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(æ = 1,2, ..., m) will cross the x-axis near the maximum-likelihood estimate of u, while the 
gradient will approximate to the maximum-likelihood estimate of — 0,1. 


E 3 " NS ^ — S in 
However, we omit further details of this ease since these are similar to those given 1 


$6, where the procedure of obtaining estimates using a modified model is demonstrated 
for the data of $1. 


5. MODIFICATIONS OF THE BASIC MODEL 


Tt will be seen from the discussion in $4 that the given data are really 


insufficient. for good 
estimation of 41, #2 


...,//5, using the basic model, since of the five straight lines to be fitted 
only three are being fitted to ' reasonable’ sets of points. This, together with other considera- 
tions based less on expediency, suggests modifications to the basic model which involve 5 
reduction of the number of unknown parameters and which might prove useful in similar 
situations. 

The variance c? of the random variable £; describing the time spent in the ith $ 


been regarded, up to this point, as a parameter independent of A,, the me 
stage i. Further, no assumption h 


tage has 
an time spent in 
as been made regarding the dependence or independence? 
of the random variables £; (i = 1,2,...,8). By making assumptions of this nat 
reduce the number of parameters to be estimated. 

There are two aims to be achieved in introducing such additional assumptions: first, t0 
make them as natural as possible, and secondly, to ensure that from the point of view A 
computation, the estimation problem they give rise to is practicable. The * natural’ assump 
tions make c? a simple function of the mean A;, whereas in order that the computation shou 3 
not be too complicated, we would like to assume that 07, the variance of 7,, is a simple fien 
tion of u, the mean of 7,. Sometimes these aims are compatible; sometimes they are} 
conflict. To illustrate this we consider the following three ‘natural’ assumptions 


ure we 09? 


(i) oF =0°, 
$ 2:995, 


(ili) o2? 


= 
zu 
T 
qa 
to 
ll 


d "By 
21 ù : 
where in each case o? is a constant. If now we further assume independence of the */ , 


eit. dm i " " 
i.e. if we assume that the time spent by an insect in any one stage is independent of the ui 
spent by it in any other stage, then, in terms of 0, and z, these assumptions become 


ü)y 6 =ro?, 


(i) 2 = mo, 
ET r 
(iiy Of = L un -m| e. 


; ; „ (D 
From both points of view the assumptions (i)' and (ii)' are ‘ good’, However, while M 
is à very natural assumption it leads to a very unwieldy estimation problem. e WO wd 
like to be able to replace (iii)’ by 02 = 20%. But this is an artificial assum: ption which a 


. H H ttt nf 
be consistent with the assumption (iii) only if we assume very high correlation betwe di 
two of the £;'s. 


g 
5d n ; — 
The assumption which seemed to the authors most reasonable for the given sot of tb? 


vas tho assumption. (i) in Ae with independence of the £/s, i.e essentl 
he &,’s, i.e. ess 


u 


J 
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assumption (ii)’. An initial test of whether such an assumption is valid is obtained by refer- 
ence to Fig. 1. For if the assumption is valid then a straight line fitted to the set of points 

: " à " , ; > Hpg 
(as Yar); 7 fixed (x = 1.2, .... m), should approximate to the line with equation Y = fe i 

NIT 
From the point where this fitted line crosses the x-axis and from its gradient, an estimate of 
9 may be obtained, without difficulty. If, as r varies, the different estimates of o thus 
obtained are in fairly close agreement and show no definite trend as 7 increases, then the 
additional assumption (ii)’ may be taken to be justified. This was found to be the case for 
the given set of data, if again not too much weight is attached to points based on values of 

r s 


M Pa; near 0 or near 1. 
1 


= 
Usually it will be possible to test an additional assumption of this nature in a similar 
fashion, 
6. APPLICATION TO GIVEN DATA 


Summarizing the model used to fit the given data we have: the probability 7,(x,) that at 
time x, an insect chosen at random will be in the ith stage is given by 


X. lt 
mit) = 1-0(% r), 


0 Jy 
Y. E (sc i= 2.3 
(ta) = D -0 ; (19:2,8,...,8); 
mid ( CO Atia CO It; 
X. — Js 
7T,44(0,) = E 
whens D(z) = 5 ) [ e-~} dt. Also the likelihood of the observed data is given by 
ANT) J -o 


m stl 
logh =k+ X Y nj log T;(Va); 
a-li- 


Where [is a constant and n,; is the number of insects observed in stage ? at time x,. 
e now have 
Om (u,) ML _ 1a, uar. (i = 1,2, ....8). 


On; Ou; ys € NI 
When 7 L du > Mam 
here Z - Jd) 3? and zy x 
J(2 
lso ĉlog L m ffs d ( 
Ou; — «ca VG) mar) OMi 
Further Tilta) _ 1 aZ E i 2,3...) 
: oc o 
an ] m &[m "€ Jaa Za 
i ee T Cia à Drs Tia 0.) i 


m of solving a set of s+ 1 

he pr ku wist d to the problei E Ml 
e Toblem of estimating the //;'s functions which are either tabulated or easily 
CN lons each of which involves only ft ich it was found necessary to use were tables 


of th "ted, (The only tabulated functions Y computational procedure follows naturally 


1 ogral a 
tom Pormal ordinate and IMC ian.) 
the form in which the formulae 


hus reduce 
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These equations were solved in the present instance by an amg "aem € ia 
essentially that used in probit analysis (see, for example, Finney, 194 à 3 era € o E 
(i) obtaining initial approximations to the roots of the equations, (ii) evaluating i don 
information matrix with the unknown parameters involved replaced by these » y 
approximations, (iii) evaluating W— and (iv) obtaining corrections to the initial ace 
mation by multiplying by W-! the vector of partial derivatives of log Lwith respect to th 
parameters, calculated with these parameters replaced by the initial approximat ions, 

This method is a modified form of Newton’s method of solving numerically a set of simul- 
taneous equations and has the advantage over the latter method that it yields finally, 
without additional computation, an estimate of the variance matrix of the estimators. 

Also the matrix is more easily computed than that used in Newton’s method. 

We now discuss the points (i) and (ii) for our particular case. f 

(i) We emphasize again that the solution of the maximum -likelihood equations 19 
tantamount to the fitting of straight lines to the sets of points (w,, Y,,,), r fixed (æ = 1,2,... m). 
Hence initial approximations to the roots of these equations can be obtained by fitting straight 
lines roughly to these sets of points. In order to reduce the number of iterations required 
in the process of solution it seems advisable to take considerable care with the fitting of these 
lines, and in so far as possible to attach weights to different points on the basis previously 
discussed. Moreover, scrutiny of this diagram can yield considerable information about 
how well we can expect different parameters to be estimated. It is obvious, for instance 
that in the present case we can expect /4, /lg and jz to be well estimated and the remaining 
ws not so well estimated. Hence before any heavy computations are carried out the 
experimenter can decide whether the experiment he has performed is likely to yield as much 
information as he desires or whether it might be necessary, for example, to repeat the 
experiment with observations made at closer intervals. 

Fig. 2 shows the initial lines fitted in the present case. The lines resulting from the final 
solution of the maximum-likelihood equations are virtually indistinguishable from these. 
The corresponding initial and final estimates of the parameters are given in Table 2. TW? 
iterations were required to produce the final solution of the maximum-likelihood equations: 
It will be seen that these produce little change in the initial approximations to the estimate? 


of the 1,’s, the parameters in which we are primarily interested. So graphical estimatio? 
of these parameters may be quite efficient. 


Table 2 
Parameter fey Hee Ps Ha m o 
Initial approximation 38.0 50-7 62-7 72-0 88-0 1-00 
Maximum-likelihood estimate 38:3 50-9 62-5 71-9 87-9 1:01 


(ii) For the evaluation of W we have 


z( a = 5 x| l pl fne , 
ye) ln (e,) 3 @= 1,2, 


Tiaa) L Op; = 1,2,...,8), 
z( c? s] _ = N, On ,(x,) Om; 44(2,) 
entis cmm) uo Qu, Y5 2-8-1), 
j / log L . . 
and z| — ios) =0 if |i—j |22. 
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Also z( - 8 ) ad & bon N, Oman) 
E eme) e o mac) ge " 
and z( c od m E: On) — XQ Oma(m)]ems) po 
mx) cv mad) oo | 0 ditus 


Ceo a 
to calculate the 1/[7;(»,)]'s. Some- 


Intl 
Ne com ati 

putation of these expectations it is necessary 
ations. The terms in which such 


times 
m(x) wi 
(zz) will be zero to the accuracy of the calcul 


Equivalent normal deviate Y 


-3 


ts (Tar Yar) 


2, Initial lines fitted to poin 


Fig. 2 
l i 
DR. ss " orki 
l Irae s are involved will invariably be negligible and as à working rule we may take 
rae —0w — id us 
ne whenever 7;(z,) = 0. case and advantage may be taken 


orm in this i : 
" d it by successive 


le i 
fosse eds : . 
V the formation matrix takes a simple s 
" CUBE in it i i i ocess. The authors inve 
9tdeii y cells in it in the inversion process. 
T -7 92 
lie £ (Frazer, Duncan & Collar. 1947, p. 112). dimittet 
sti — Matrix W- is an estimate of the variance matrix V of 609 
siy "tors of the unknow ters and W~ may pe used withou 
e e unknown parameters * ions to maximum- 


Iterat; „ximat 
tations to obtain corrections to successive app" ox) 


naximum-likelihood 
alteration in succes- 


likelihood 


ive 
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š zi 
estimates. However, a better estimate of V may be obtained finally by computing W d 
the information matrix with unknown parameters replaced by their maximum-likelihoo 
estimates, and using Wy! as an estimate of V. For the given set of data we have 


0-3080 0-0419 —0-0065 0-0000 —0-0530 —0-0050 
0-0419 0-3890 0:1156 0:0333 0:0161 0-0006 
— 0:0065 0-1156 0.7835 0-2238 0:1521 0-0084 
0:0000 0:0333 0.2238 0-9988 0-2942 0:0016 
— 0:0530 0:0161 0:1521 0-2942 2-2144 0:0239 
— 0-0050 0-0006 0-0084 0-0016 0-0239 0:0022 


w= 


while 
0:3090 0:0423 —0:0078 —0-0013 —0-0608 —0-0058 
0:0423 0:3742 0-1218 0-0361 0:0229 0-0012 
w- — 0:0078 0-1218 0:7793 0-2308 0:1752 0:0088 
$ — 0:0013 0-0361 0:2308 1:0123 0:3056 0:0022 
—0:0608 0-0229 0:1572 0:3056 2-2356 0:0249 
— 0:0058 0-0012 0-0088 0:0022 0-0249 0:0023 


where in each case the elements of the main diagonal refer in order to His flay er Hs and 9 
In this case initial approximations happened to be so good that the differences betwee? 
corresponding elements of W-! and Wy? are very small. However, if initial approximation“ 
had not been so good there might have been considerable differences in the correspondi"? — | 
elements of these matrices and so W;i' should always be computed. 


f a V à ji (OL 
As mentioned above, estimates A; of the mean times spent in the different stages, L0 
A; (i = 1,2, ..., 5) are obtained b 


S : : H 
: > y differencing estimates of the Hrs. Also an estimate O tl 
variance matrix of the corresponding estimators of À; (i = 1, 2, 


i --., 5) and g is obtaine 
forming the matrix V, s = AW;1A’, where 


1 0 0 0 0 
—1 H 0 0 0 

0 —I1 
rm 0 0 0 
0 —1 100 
0 0 -1 109 

. 0 0 0 

and A’ is the transpose of A. i 


In the present instance we find that 


HW 


= 38-3 0309 — 0.267 

= 126 —0-2607 9.572 

11-6 —0:050 — 0-202 
Ya gU 

9:4 i 0:006 —0-092 

16-0 — 0-060 0-046 

1-01 — 0:006 0-007 


—0-060 —0-006 
0-046 0-007 
—0-060 0-008 
—0-633  —0:007 
2637 0:023 
0008 —0.007; 9.923. 0.003 


r3 
I 


| 
| 
= 
e 
= 
o 
l 
e 
E 
> 
w 


>» » 22 » B2 
D e» 
ll 


» 
[ 
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It wi 
pon MM mer of the Ns and c are such that they do not invalidate our 
Saba Ghee . an ~ deviation of each £; is small relative to its mean. 
fidence intervals icp shes th es are large it is now possible to obtain, in the usual way, con- 
had data se di ee : mean times spent in the different stages. Also, for example, if we 
lists abes be nm Pins of insects we could clearly test whether the times spent in 
We cone 3 S erent batches were significantly diffe’ 
in each stage ^ : us paragraph by comparing in Table 3 t 
ge at each time (when unknown parameters are repl 


with t} 
1e eorr : A M " 
leas thar orresponding observed number. If entries in this t 
A n 5 g : . . . B d 
5 are pooled with neighbouring entries at the same date in the obvious way (so 


that w 

we ha : 

even h ave a total of twelve observed and expected numbers to compare) then à xe-test 

een o significant differences between observed and expected numbers. Since à total of 
ameters has been estimated the number of degrees of freedom associated with ne 


rent from one another. 

he expected number of insects 
aced by the above estimates), 
able where expectations are 


is 126 = 6 
E Table 3. Observed and (expected) numbers of insects 
| Stage 
Date | : 

1 | 

1 2 | 3 | 4 | 5 p 
* i E t — ES al 

990 May | a | | 0 0 (0) 

Y | 81 (30-8) 2 (22) ow | 90 | pu | 
ayoo | 78(7e8) | ass) | 0 09 | 909 $ d | 0 0) 
Bero | Gr ] 90 (97-8) 38 (28:8) | 4 (20) 0 (Qu | g Tum 
jogano | 0 (02) | 31 (28-2) 63 (65:6) | 23 (241) pA | uoo) 

July | o (0 | O0 (0 s aa | 1057 | 65 (me | * ( 


ROBIT ANALYSIS 
antal probit analysis are applicable is, in the cus- 
ws. Random samples of subjects are subjected 
t may or may not respond. Any dose 
lied into those responding and those 


7. GENERALIZATION OF P 


The gi 

er UR where the methods of qu 

o eid terminology of this analysis, a$ follo 

este in doses of a stimulus to cach of which a subjec 
în a dichotomy of j shich it is app. 

Oo y of the sub ects to whi i ae 

sub esponding. A dose is said s be effective for a subject if it produces a response in that 

fen and the minimum effective dose for 2 subject is called its tolerance. of the main 

noster i is i i tolerance (Finney, 1947). 

G is type of analysis is to estimate mean tt i : T 

a oM a situation might arise where in place of a simple reped im E gear 

$ i ordin; X - 

ore than two classes by any dose of the stimulus. Acci gly V we 


e . 
a sth, Where random samples of subjects 277 eile oa aie i placed in one of 
s, nulus and as a result of the application of t 


+ he dose v, eac e onia 

"io i E periment is given by Tatters eld. 

timi asses, A straightforward aer du an exp SHOR ms vod 
assifie 


J Igham & : ts subjected uu a : 

me Mly ee y Men ^ The particular m discussed above is another 
“tration if, in this case, time is regarded as the stimulus. 

the ? now ask what conditions must be fied in this capa diem 
Method of analysis used in our partic ld be apr 


proble 


ti al experiment in order that 
satis 
ular case shou 
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2 shows that the crucial conditions are . 

pus dies must be ordered, mutually exclusive and exhaustive, —m " 

(ii) the reactions of a subject to increasing doses must be systematic in J : i Far 
dose x places a subject in the ith class then — greater than x is required to ] 

bject in the jth class whenever j is greater than i. P 7 
"When iue si are satisfied then the model of $2 can be used to describe z 
experimental situation in the following manner. If for a subject chosen pcs 
tolerance of this subject for the ith class is y; (6 = 1,2,...,s) (i.e. y; is the minimum ae 
required to place this subject in the (i + 1)th class) and if every subject is placed in oe 
class by dose y, = 0, then y;—y;_, can be regarded as an observation on a non-neg n 
random variable £; (i = 1,2, ..., s), which describes the marginal tolerances of the subj E 
for the ith class (i.e. the differences in tolerances for the ith and (i—1)th classes). The em 
value A; of £; is then the mean marginal tolerance for the ith class. Further, tho rand 
variable 7, = X £; describes the tolerances of subjects for the rth class, and x, = 2 (ir) ® 

il 

the mean tolerance for the rth class. Methods similar to that given subsequently to $2 n 
then be applied to estimate the [;' 8 and/or A;'s, whichever happen to be of biological inteso 
We emphasize again that the assumption of normality of the £s which we have used in 0" 


ri ,18 
illustration is possible only if it can be assumed that the standard deviation of each bi 
small compared with its mean. 


We remark finally that if s = 1 then the present 


probit 
analysis and it is in this sense that we have generaliz, 


analysis becomes an ordinary 
ed probit analysis. 
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EXPERIMENTING WITH ORGANISMS AS BLOCKS 


By S. C. PEARCE 
East Malling Research Station 
assume that his treatments will have little effect outside the 


not justified. Thus, if an animal is 
lum it is conceivable that a strain 


An experimenter may usually 
hl NUR eadh is applied. but sometimes this is 1 
that ah ig c nei with different strains of inocu ` 
each branch a go ig RE — wem Again, a tree may have 
aem O accord ing oadi ies met hod, anc nm possible that a method which 
thes dm = pping on the branch to w hich it is applied will encourage cropping elsewhere 
lociones ag to a smaller degree. Norareremote effects always of the same sign as the 
different d SUIS. 1 the blossoms of a plant are grouped and each group is pollinated with a 

; variety, it may well be that a kind of pollen which induces high fruit-set locally 


| d thereby inhibit fruiting elsewhere. 

ip NON and importance of such remote effects is often problematical, and may 
fus. o be investigated. Experience with the method to be described has shown that 
. lere are sometimes remote effects of considerable biological interest. On the other hand, 
it may appear that none exist, and simpler methods can then be used with confidence in 


the future, 
n NOTATION 
he local effect of a treatment on the plots to which it is applied will be represented by the 
f the treatment. Its remote effect on the other plots 
will be assumed that it has no effects in other blocks, 
at the remote effect is 


It will also be assumed th 
biological problem, but one that is fairly 
mmetry or asymmetry 


oe Yj, where j is the number o: 
Which I block will be written ój. Is 
the s 1 will consist of different organisms. 
eas E me on all other plots of the block; 
of th 9 decide, because the answer ean be fo 
‘1e organism. 
oues is also the possibility of local 
the il en assumed that the remote effect oft 
Mei itself, it might depend upon the ha ipt 
treat Žr will be taken as the additional often 
atment j is being applied elsewhere m the block. 
e x are instances in which this concept of ae : 
ate iologist. Thus, in the studies of the effects ot C1 
Sie: & Leech (1954), the concept wa 
a-pig irrespective of the site of injection, 


th, 
Systemic effect equals ô; in the notation 0 | pape 
little modification of the working is re 


]l be written as % and the parame 


this is à 
und in the physiological sy 


e effects interacting. Thus, although it 
ven plot does not depend upon 
applied locally. The para- 
ment k when 


and remot 
reatment j on à gi 
t that is being 
of a plot receiving treat 


and remote effects may not appeal to 
rent kinds of tuberculin described by 
s rather of à systemic effect over the whole 
andof a local effect at the site. Since, however, 
f this paper. and the local effect as now 


®fined quired. 

The Saas a") ter for block h as Ar- 
general parameter wi 

for the unit to which experimental 

s will be very dis- 


istical term y 
2 dered these unit: 


The word ‘plot’ is used because it is the usual m qu Ac 
„ments are applied. However. for trials of the Ke 


ilar c 
from ‘plots’ in an agricultural sense. 


+ 


tre, 
Sim 
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The following equations of constraint will be used: 
0= DAH Dy; = 24; E = Xy. 
D j F : 


THE CHOICE OF EXPERIMENTAL DESIGNS 


; ns 
If an orthogonal design, e.g. randomized blocks, is used it is apparent that a ane 
of separating the parameters of the remote effects, the evaluation of which is ys E of tbe 
Thus, within a block all remote effects occur equally often and will be lost by va ES in Hi 
equation of constraint involving them. Nor will the parameters y; and ô; be separ i" remediis 
expressions for treatment totals; thus, if injection of a branch of a tree with a nutr = An 
tion increases the crop as compared with the other branches by 301b., it is not ie ae 
say whether there has been a local effect of +30 and no remote effect, or a local 9 pe^ 
+50 and a remote effect of +20, or what. For all the figures prove there might have 
a local effect of — 30 and a remote effect of — 60. soil 
If, however, balanced incomplete block designs are used, it is apparent that bem 
comparisons within blocks will provide estimates of (y; — 2;) for all j, as before; a o 
comparisons between blocks will give estimates of [yj + (k — 1) 8j], where k is the numb S b 
plots to a block. Provided that the inter-block differences are determined with reason jon. 
accuracy, it is thus possible to estimate y; and ô; separately to a useful degree of nu a 
The design proposed is, in effect, one of b kinds of block, with v treatments, & plots '* 


: gsitY 
block, and n of each kind of block so that there are nb blocks in all. There is no w 
for n to exceed one, but in practice it is usually desirable if the inter-block comparison 
to be of use. 


EE uu ; ; fects: 
Where it is required to take account of the interactions between local and remote & 
choice must be restricted to a narrower 


e 

: ‘ons ar 
range of designs. Thus if the normal equatio! 

to be manageable the design must be u 


: exti ent? 
nreduced,* i.e. all combinations of the v treatm 


. "Td 
taken k at a time must occur equally often in blocks, so b must equal () . Also, if there 
kt 


to be any degrees of freedom for error and if k = 


it wil 
appear that k must be at least three, w 


v—1, n must be at least two. Also, 
hich means that v must be at least four. 


ANALYSIS OF DATA 
(i) With interactions ignored ent 
This is not difficult. Let the grand total of all data be G, let the total of data from trea 
j be T;, and let the total from all blocks that do not contain treatment j be Uj. ducc? 
+ It has been pointed out to t 


he writer by Mr Q. H. Fre i iction to unre m 
designs is sufficient to ensure vi eman that this restricti 


" me" . 
to | at no normal equation Shall contain more than two $ per E 
(i.e. Jy; and à), but it is not necessary. Thi 

contain treatment 7 but not tre: 


ks 
^0 necessary restriction is this: Considering only kn 
t i treatment j, and omitting from them all lots to receive treat” stot? 
then, for all 2 and j, the remaining design must be in balanced ine , 
of the remaining design be given the conventional symbols, b, k% 
capitals refer to the original design, then oe 


B=b(v+1) (v -- 2)/(v — c 4- 1) (k-- 1), 
y 


K-kkl, Rawy —k- 1) 
=v+2 and A= bk/(o — ha f (v+1)/(v i” 
The simplest integral solution of these equations, apart from those arising fi unreduced dem po! 
appears to be b= 18, k =4, r=8, v=9, À= 3, B= BÓ, EG kg pe Tnd ded p m. 
known if such designs exist, but if they do they are too large to be of : " ] importance: ee oer 
possible existence does not justify complicating the algebra. Ni practical imp nn 
is of some theoretical interest. 


4 
evertheless, Mr Freeman 5 
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Estimat ing t thin b ocks gives (y; — à;) equal to 
ng reatment effects wi hi gi ( j 2] q 
j 


ES 
nbk(k — 1) (kT; + Uj- G). 


Estim s 
ating them between blocks gives [y+ (k—1)6;] as 
(v—1) 
ie [(w—h) G — 2G). 
Hence, estimate of y; equals iu ‘ 
(v—1) 
-2 [(v— T- Uj], 
and estimate of 8; equals nbk(v — k) Put 
u (v— 1) 
m moe 9-9 1)0j- (»- T 
he st: 
“emo errors of these estimates & 
quare of the analysis between blocks 


-D-H gy 
vU 


obtained. If M be the error 


re likewise readily 
analysis within, and writing 


and MW that ofthe 


Mo= M+ — 
and 
w-k) qr 
M*-MtTO mM, 
[ M+ ied 
th 2 
E [standard er Q0 «fe a 
[standard error of estimate of y] = DPO- M’, 


[standar 


Thee 
or , 
for the = responding expressions for the stand 
Meter fo placement of M? by M*. Tf k equals 
8 k inc E its remote effect as often as to the one for its loce 
aro det, teases from two, M+ becomes progressively smaller t 
as well es with ever greater accuracy as compared with the local effects. This is just 
NC” J MBA RARE effects are likely to be the smaller and 80 20 
cn them. 
^ lied ix aam had been appli 
eir ee the quantities [æ +Y; + (k—1) 5j]. Thes 
ace a if this is done, the (standard err 
ne-E, 
nb(v — k) 


estimates would have been 
also be obtained by adding 
fference between two such 


ole organisms, 
e values can 
or)? of the di 


ed to wh 


usted to (yj 95 in the 


meter 7; has been adj 
Xó; summation taking 


It sk 
T aum be noted that just as the para A 
ace oy within blocks, so also has fj been adjusted to fk = hh 
er j for treatments lacking in block ^. 
ctions allowed for 
already giv 
ters of the type 


en still apply; because the 


qi) With intere 
dy. Tt follows that if 


Where i . . 
nteractions are considered the equations 
no parame 


9xp 
Tessio 
ns for 7, U; and G will contain 
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ġ is to be evaluated use will have to be made of further quantities; the following will be 
jt 
found convenient: 

D,, the total for treatment j in blocks from which treatment t is absent. 

E, (= Ey), the total for blocks containing neither treatment j nor t. 

The total for blocks not containing treatment t; this is of course U;. 

Thus, Dj, E, and U, are respectively analogues of T;, U; and G. It will be convenient to 

weite Fy =kDy+EHy—-U, and V,=kT,+U,-G. 
The parametric expression for F, is 


—3\[(v-k-1 kv—3k—v41 
Hea [ya & EE gat- D-73379). 


Since, as has been shown, the corresponding expression for V; is 


nbk(k— 1) 
E (y;— 9j). 
it follows that 


(v—1)(v—2) (v — k) [(k— DV +V] — k(v— k) [V 4- V] 
—v(v— 1) (v — 2) [Uc — 1) Fy + Fy] + kolo — 2) Lg + Tl 

: : v—2 
is an estimate of nk(k — 2) (v — k) v(v — 1) ( Jø 


k—2) "w 


This expression can be set out in other ways, but it does not appear that 
natives is more convenient from the point of view of pr: 
k = v— 1, when it is easier to take 


any of the aue 
f 1 
actical computation, except Whe 


(v— 1) Vt V-wv—2) Fy, 
as an estimate of nv(v — 1) (v — 2) 
ha 2, 


A : jo 
$5. It may be noted that in another special case, i.e. gH 


(E- WHY, +h = Foe Fy = (k-1) B+ FB, 


and the expression equals zero, as does the coefficient of d t tb 


ie dx It is for this reason tha 
condition £ z 3 was made. " 


If v—k = 1, a test of the interactions can very readily be made, for in this case the par? 
meters, dy, are orthogonal with the parameters, Jf. as well Bk with ( (à). First; © 
analysis of variance is worked out ignoring interactions and takin ah only 9 i 
parameters, a, f% and (y; — 95); let this give an error sum of squares <> with ` 

19, 


[nb(k — 1) — (v — 1)] 


degrees of freedom. It is now necessary to evaluate the interaction sum of squares, W, 


this is best done by multiplying the estimated value of i n 
£ each racti rameter, 
by the total of the data to which it applies, i.e. icine SUE 


and 


gi 


W = X640,- Dj) = -£ $,D,. 
zi b d ) 
1 
+ 
f squares. Since W has (v?—- 3” vil 
2). A comparison of W and S by the puse 
tion of local and remote effects. 


The quantity S (= S’ — W) now becomes the error sum o: 
degrees of freedom, S has [nb(k — 1) — v(v— 
show if there is any evidence of an interac 
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"à Ifv-k>2, Př needs to be adjusted to 


I ri la a lv 
Bi = By i (A+ Y) =fr- iA (Dirt hy): 


where we E " 
2 de TES P Aare T : : 
lenotes summation where both treatments j and t are present in block 4, and X^ 


denotes simi à 
br iiu summation when neither treatment j nor / is present. The choice of sum- 
ation is ; SET : ; 
isi x. à question of convenience. It follows from this that S' will need to be adjusted 
adc NX wa sie ” A 
ing Z — XB — /**), where B, is the total of data from block 2. S now equals 
h 


ee Zw 
W, the degrees of freedom for W and S being as before. 


t; VALIDITY OF THE TEST FOR INTERACTIONS 
1s not ¢ as : -— S = 
it to1 t at once evident that the F-test for interactions is unbiased, but examination shows 
m e so. Suppose that there is in fact no interaction, then the parametric expression for 

ids 3 aor E K 

i» the figure given by treatment j in block ^, may be written 

a+ pi (r7 9) Yap 

p etc., be functions of the x’s corresponding 
= Bi, = @ =0and Vi = kTjand Fy = kD5. 
there are in fact nointeractions, 


Where 4... ; 
tod. is a random residual. Let 7'j, Uj, Dj 
a i» Dy, ete.,as functions of they’s. then U; 
Sequently, if $, be evaluated now, i.e. in the case where 
= : wm 
(v— 1) (o—2) v — k) [(k — D 7/4 T] - k(o- [T+T] "Y 
—v(v—1)(v- 2) [(k- 1) Di, + Dj] + bv(v — 2) [Diy - Dij] 


will , 
be taken as the value of 
y-2 
—2 


n(k— 2) (o — E) wo — 1) b ) Êr 
^, it follows that 


It r ; E » 
t remains to evaluate W assuming the null-hypothesis. Since $y = fir 
"E i Ë ters that is determinate 
Ef Dy. Also, Dj equals Dj, plus a linear parame a 


function of 

for 
ARE M 

ach combination of j and t. 


et € renn 4 
e(x2 € represent expectation under r 
)=0. Since 


andom permutation of plots within blocks, and let 


e(T^) = (Ty) = (Di) = e(Di;) = 9 
v— i| 0 
e(T; Dj) = (Di) =” ra i 
eT; Diy) = e(Dj Dr) = 9 
It fon k(v2—30+ 1) 
ows that eW) = ( ie 1) 0, 
Le, p is is true. But this is 
* KOL 1) ¢ „vided the null-hypothesis 1s 
E "ia rm Sata p e = fase a design in i cl! c 
o r S', which is the error v — W. Thus, by virtue of the 
mpg and hence it must be the same for S, W Be a and S is unbiased, at 
lea, *Mization of treatments within blocks, the 
In ESI p ] = i ed. This has been 
Ws p. ME cases, however. there is the a s in block h. 
N la 2 re nelt. 
: s Ca € ufieimsoF iue Yom pp "cm o. it follows that «(Z) — 0. It thus 
Pears " nT) = eB, Tj) = (B, Dj) FI wes all the designs considered. 
S that the F-test between W anc 


Ov 
f squares of 
hen this equa 
F-test between 


J jg valid for 
PaaS Biom. 44 


1o 
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NUMERICAL EXAMPLE 
The data of this example are drawn from the first trial to be designed at East Malling using 
the above approach. As such, it is open to improvement, but is for that reason instructive- 
The experiment was intended to study metaxenia in apples. Metaxenia is the = 
menon of pollen affecting the characteristics of the plant pollinated, and it was propose 
to test whether it is in fact true that the size of Cox’s Orange Pippin apples depends in part 


on the pollen. In particular, there was a suggestion that pollination by King of the Pippins 
gives large fruit. 


Table 1. Data of experiment on venia effects in apples 


Figures represent mean fruit diameters in millimetres 


— - 
Kind of pollen, j 
Wave Block no., Values 
h of By 
1 2 3 4 
A E gn 
| First 1 — 1871 17-01 17-23 52-95 
2 20-42 mh 18-02 19-59 58-08 
3 21-39 20-85 = 18-94 61-18 
4 19-49 18-48 17.22 = 55.19 
Second 5 = 23-16 23-33 23-34 69-83 
6 24-67 — 24-54 25-00 1421 
7 24-14 21-78 — 20-86 66-78 
8 26-70 25-75 24-77 e 77:22 
| me 
Values of T, 136-81 | 128-73 124-89 124-96 5153920 
U, 122-78 132-24 127-96 132-41 
V; +17:82 +304 — 12:76 — 8:10 i 
| | m 


In view of the great labour of emasculating blossoms and then hand-pollinating the™ a 
has previously been the practice to divide the blossom trusses into in to serve as P a 
using whole potted trees as blocks. At the suggestion of the ni ta thi trial, condu® 
in 1952, a balanced incomplete block design was chosen. Pollen Fo fi : rieties sem 
be used (v — 4), namely, 1 — King of the Pippins, 2 = sen des et 


; Ellison’s Orange, 3 = James Gne 3 
and 4 = Worcester Pearmain. The blossoms of each tree were bine into three gov, 
instead of four (k = 3), and there were accordingly four kinds of blocks (b = 4) depe” ji 
on the pollen omitted. Twelve trees were selected; four we + m 


re brought on by heat treat? ag 
four were allowed to develop naturally, and four were retarded ta a ind these 125 E 
not appear wholly normal and were discarded. In each of the ‘w. 2 4 e use o 
tree was allocated at random to make risen di le 


oni i > po 
applied each to one group of blossoms ine dens Nae iain nir kB o 
each kind (n = 2). The grouping of blocks into waves was or ead cid but j o 
rise to only trivial modifications of the standard method; it wa; MEM dr t0 8 Jd 
out the labour of applying treatments. Blocks were assigned s arranged in or soni? jo 
and this accounts for the apparently systematic design. numbers after ran J 
; up 


The figures in Table 1 represent average fruit diameter in millimetres over each gr? 
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blossoms at 49 days after pollination, together with values of B,, T;, U; and V. As a check 
it may be noted that the B’s and the 7"s should sum to G, the U's to (v— k) G, and the V's 


Table 2. Values of (y;—9;) and 2457; 


J 1 2 3 4 
(y,—à;) + 1-11375 +0-19000 —0-79750 — 0-500625 
h 1 2 3 4 5 8 7 8 
24/5 —8288 49.00  —3233  -—7792 +5216 +7981 +1247 49832 


Table 3. Values of Dj, Fy and 14495 


2 -+ 96-78 . E: 
3 — 85-02 s 41542 = 
4 —11-76 - 3-66 

ae i —— 


i z I l t 
to Zero, Now (y, —8,) is estimated by (»- 9 J(nbk(E — 1) which leads to the values se 
Out j + 7 0f ) 

t in Table 2, Further, B, equals ; 
Sin ka + kp# — [sum of (y; — 9) for — dort is Tio 
"PN = G[nbk, values of 24//j can be determine 


sk 
% Should sum to zero. ibs t8 
Tom these values it is possible to work out 


ring interact; M 
tions. This equals " 
8) - X Bui: 
h 5 (data)? — G*|nbk — z Tis ib G 
^ a S 
Mely, 6:7350 with 13 degrees of freedom. he first step 18 to write down the 


i ti 
9 ible interactions, horizontally to U, and 
Vay Proceed of possible Jd sum horiz y t 
alue, now to the study oi P shou veo cm 


'hese : EN 
ert; 3 of D, as is done in the top part of Table » s of By, but n this inst: 
ically to (v— k) T.. The next values needed are 10-2 


s lacking in block A]. 
2. Both (y;—4;) 


sidual sum of squares within blocks, 
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v = k+ 1, they are all zero and need not be considered further. In the general case they 
should sum horizontally to (v— k — 1) U, and vertically to (v—k—1) U;. k 
From a knowledge of U, Dy and E it is easy to evaluate Fy; the results of doing so n 
shown in the middle part of Table 3, and sum horizontally to zero and vertically to (v — k ) y 
From these values it is a straightforward matter to obtain estimates of the interaction 
parameters, Êi though it is usually easier to estimate some multiple of them. In es 
example, to maintain generality the full expression has been used, and the figures in th 
bottom part of Table 3 are of 1444,,; had the simplified expression for the case. v = k+l 
been used, the multiple would have been 48. The figures should sum both ways to zero. 


Now, W = — X ÊD which in this case comes to 3:5340. It has five degrees of freedom. 
ti 


Since v = k+ 1, no adjustment, Z, is needed. 
This gives rise to the analysis of variance: 


Source D.F. s.s. M.S. 
Interactions 5 3-5340 0:7068 
Error (by difference) 8 3-2010 0-4001 
Total 13 6-7350 


n . H H H : ive 
This shows that the interaction effects are not significant; also there is no suggest" 
regularity in the table of 144, so it was decided to ignore possible interactions. 
It remains to evaluate y; and ô; separately. This is done quite easily: 


Y= 1554, 0, = +0-640, 

Y2 = —0:439, ô, = — 0-629, 

Y3 = —0:384, 04 = +0-414, 

Ys=—0-931, ô, = — 0:425. 
The values of y; and ô; should both sum to zero. The former series is of great interest, because 
the figures do indeed suggest that pollen from King of the Pippins (j = 1) gives larger fo 
than do the other pollens. The statistical significance can only be judged from a knowle z 
of M? and M+, which in turn depend upon M and M'. M’ 


g 
' vari i ; may beevaluated from the analy® 
of variance just given as 0-4001; to find M it is necessar 


15+ 
y to consider the values By, thu 
Type of block 


» 
1 2 3 
Ist wave 52-95 58-03 
x 28:0: 61- 
2nd wave 69-83 74-21 d 


66-78 
Eliminating both theeffects of typeof block 


ois 
n andof waves, and rememberin gthateach figu" ph 

the sum of three data, these give an error mean Square of 7-9216 with three degrees of free i ) 
Hence, M = 8:1217 and M* = 7-9716, and the standard errors of (y;— Yi) and (2; ^ is 
are respectively 1:645 and 1-630, Degree: Mes T 


8 of freedom are very few sven so the © 
provide some support for the belief that the kind of pollen Brada P 


: s eci? 
i affects size of fruit, eSP“ . 40 
as the effect observed was nominated beforehand. The values of à, are not so different ' 
demonstrate the existence of remote effects. f 


Discussion 


m 
The usefulness of this method depends upon the q " 


are to be expected. and if the aim is to assess the effect of treating whole plants or ant ier 
nothing is to be gained by treating parts of them. In such a trial the quantities tO à 


1uestions to be answered, If remot? ? Je" 
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se sree Mar. scia y; (E— 1)8; for the various j's, and the accuracy will in any 
itis ve k bs not at all on M . Further, if whole organisms are used as plots, 
o imt h ifferences between estimated treatment effects will be 2vM /(nbk); whereas 
S d ised as blocks the variance will be 2v(» — 1) M/(nb(v— k)}, so the additional labour 
If ing small plots may actually have led to a substantial loss of accuracy. 

"os oho ing hand, the experimenter is concerned to investigate the mechanism of his 
sitas ess is T the proposed method is of especial value. Thus, ina recent trial the remote 
nid Sia : "us be nearly equal to the local ones, and this was of some interest, indicating 
the Mic da : ne local application ofa treatment affected all parts of the organism to about 
in their ea ee. In other circumstances, it may appear that treatments are strictly local 
ide. ects, or that remote effects exist, and are or are not proportionate to local ones, 

proportionate are of the same or opposite sign. All these phenomena may be suggestive 


aioe modes of action. 
of atone use of the method, however, comes in exploring the desirability of using parts 
local clase as plots. If it can be shown that treatments of a certain kind have only 
not Mino or have remote effects that are proportionate to the local ones, 80 that they do 
of the 5 idate a test, then in future study of such treatments it will be possible to use parts 
veni AE animal or plant as plots without regard to remote effects, and to use any con- 
precisi ock design. Conclusions will then be based on M ' rather than on M with a gem in 
Latin that may be considerable, and less material will be needed. Further, in prà 
of th = S possibility of doing this, some information will have been gained as to the effects 
In he, LED much of it at a useful level of accuracy. reper 
import, ing the method described it is clear that the standard error betw = irs a: 
of Se as that within, and it must, therefore, be based on a uir er : | i 5 
M^ ton 3 This was a fault of the trial from which the example was hec ; houg etai 
able he original intention to use twelve trees rather than eight. The nee — ar : 
econ number of degrees of freedom for M means that the method ia 4 c ia E go 
a tine of material as long as local and remote effects are both stu 2 k ^ kin ee 
f. er stage it is decided that remote effects can be ignored, and accor ingly w E p 
‘nimals can be used as blocks without any of the complications discussed in this paper, 


ne ec 
onomies 
omies may be very great. 


SUMMARY 


isms as blocks are considered, special attention 


lied to one plot having a local effect on the plot 

here in the block, and also to the case of 

s given, and the usefulness of the 
g , 


Theisen 
dog implications of using organ 
9 which Ps m the case of a treatment app 

emote it is applied and remote effects elsew -—— 

et Spp local effects interacting. A worked examp 
is discussed. 
ydeman, who provided the data of the numerical 


Ih 
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THE USE OF A CONCOMITANT VARIABLE IN SELECTING 
AN EXPERIMENTAL DESIGN* 


By D. R. COX 
Department of Biostatistics, School of Public Health, University of North Carolina 
and Department of Statistics, University of Californiat 


1. INTRODUCTION 


à " e 

Suppose that we have t treatments for comparison using N experimental units, for an 
animals, and that on each unit we have a preliminary observation, for example, the M 
weight, available before the treatments are allotted to the units. Suppose that it is tt 
that the final observation of interest, y, will, in the absence of treatment effects, be e 
correlated with the preliminary observation, x. Then a number of standard mlt a 
available for exploiting this correlation in order to increase the precision of the estimé 
treatment effects. ] mall 

The object of this paper is to compare the methods in some simple situations with a i ghe 
number of experimenta] units. Attention is restricted to experiments in which each © 
alternative treatments appears the same number, k, of items, so that N = tk. 


2. SOME STANDARD METHODS 


$ "sons 

We list first the following methods for increasing the precision of the treatment compar : may 
Method I. An index of response is used, for example, y[a, y[a*, y — x, etc. The index 

b 


e i for? 
€ suggested by analysis of previous data or by general consideration of a plausible 
for the regression relation between y and x. 


Method IT. The treatments are completely r: 
an adjustment made for regression on a by 
linear regression will be considered. ped 

Method III. The experimental units are ranked in order of increasing x and then. p t 
into k blocks of t units each, the first block, for example, consisting of the ¢ units with o in 
values of x. A randomized block arrangement is then constructed, based on this gro"? 
of the units. 


Method IV. This is applicable whe 
ranked in order of increasin 


4s and 
: : ts M 
andomized over the experimental unt 


i i implicity 9" 
analysis of covariance. For simpli 


with £ = k = 4, N = 16, the units are numbered from 1, .... 16, in order of increasing 
a Latin square set out, as in the following example: 


Order within block 
AW 


p 
Block no. 1 2 3 4 
1 1:75 2:7, 3:T. 
š m. 4T, 
2 5:T, 6:7, 1:75 8:7; 
3 9:7, 10:7, Tim. 12:7, 
4 13:7, 14:7, 15:7, 16:7, zd pd 
* This paper was prepared with the partial support of the AirR nt Co 
under contract with the U.S.A.F. Sc umi Duel opns 


hool of Aviation Medicine, 
f Present address, Birkbeck College, University of London. 
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A similar i 
ar method, employi 
bito Men ploying Youden squares, could b ved if ki 
er require a lengthier analysis danin d i 
ethod V. Methods II iay 
' y and III may be combir ; usi i 
pls ae i j ined by using the randomized block design 
Method VI B : ] 
Bitanfebes ei z oftan happens that the preliminary observation v is not the only thing 
bibi aie 1 the units that can be used to increase precision. One of a number at os. 
criteria an du a grouping into randomized blocks should be made on the basis of de ti : 
Nahas Yi n a covariance adjustment made for v. T 
pem fae A systematic, or non-randomized, design may be used, chosen, for example. 
ho dicem e precision under the hypothesis that the regression equation of y on » 
À al of low degree (Cox, 1951). The use of such a design has the disadvantages 


attenda 
rw = ^» the lack of full randomization. 
igh ; : 3 
in aie ` . method is that of Papadakis (see, for example, Bartlett (1938)). This is useful 
ge exper : : 
T iments in which there are expected to be trends or serial correlation between 
al units adjacent in space or time, but will not be considered here. 


3. BASIS FOR COMPARISON 


Deno 

te the ¢ tr ; 

after a Area, treatments by 7}, ..., Tj and the corresponding estimated treatment means, 

errors a : S ment by covariance where appropriate, by Îr Ie We measure the random 
ssociated with the design by the variance of the estimated difference between a pair 


of treat 
atm , . 
ents, averaged over all pairs of treatments, 1.e. by 
y,2 Ave V; - 8) q 
iji 
ation residual variance o and is not affected by errors 
imprecision of the design; it is slightly different 
by Greenberg (1953). 
V, making due allowance 


imprecision, Var 
the residual variance. To obtain 


of degrees of freedom available 
26); that is, we put 


This j 
in n Pa és multiple of the popul 
Tom the "i: à Call (1) the true (average) 
ften pe es ity suggested by Lucas and used 
for the Mogens we are interested in the apparent 
a We use Fi t he loss of information that arises In estimating 
© estim Poe s factor (f+3)/(f+)) where fis the number 
he residual variance (Cochran & Cox; 1950, p- 
ES (4): 
a f+} 
? residual i bitrary rule that if f 
"rom ^" variance is considered possible from the data 2" ee 
cance he point of view of the length of confidence intervals treet ee m - 
9quiy 1 ests, a situation with V, = 1 and wi pum ed from nearly 
“ ent to a situation in which the residu: variance 18 known and " = 1. T rss 

— of the average variance in (1) is @ natural step but wats : : Fus id pa He s. 
9twee aniy in small experiments where there may be ee : wot ab = os 
andomi different randomization patterns and between different comp ë 

zation pattern. This is particularly $0 with Method II. 


(2) 


<5 no effective estimate of 


This y,; 
S w 
h ill be supplemented by the rather ar 


th the yariane 


al 
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4. CALCULATION OF IMPRECISION 


To compare the different methods it will be assumed that, in tthe absence of asin 
effects, y and x have a bivariate normal frequency distribution with correlation coeffic pr 
p, and with the variance of y for fixed x denoted by o$. If the effect of variation accoul 

for by x were completely removed, we should have V, = 20¢/k. Therefore we write 


9g2 c 
fe d m 
hl a 


and use J, and J, as indices of true and apparent imprecision. Clearly 7, > 7, > 1. 
We now compute J, and J, for the various procedures set out in $ 2. 


Table 1. Loss of precision from using wrong index of response 


me 
E - 
Range of f/f] within which I,« | 
I, ifs 
p ignored 
Ll 12 15 
| 
0-2 (— 0-55, 2:55) (-119, 3:19) | ( — 2:46, 4-46) | IDE 
04 (0-28, 1-72) (—0-02, 2-02) (—0-62, 2.62) — | s 
0-6 (0-68, 1-32) (0-40, 1-60) | (0-06, 1-94) | ipo 
os | (0-76, 1:24) (0:67, 1-33) | (0-47, 1-53) gra 
g, (0:82, 1-18) (074, 1-36) — | (059, 41) $26 
| 95 (0-90, 1-10) (0-85, 1-15) | (0-77, 1-23) 10:26 
| | um 


| . dex 
Method I . Suppose that the true regression coefficient of y on x is // and that the indi 
of response is y— Pox. This situatio: 


+, ead 
n has been considered by Gourla (1953). It 18 = 
shown that the index of true imprecision i y J 


2" 
T= (rose ep alg (p40), 6) 
f B 
lefüot]jei (p= 0). 
Tf no attempt is made to use x, ie. if By = 0, I, = (1 — p*)^*. Thus the attempted corr 


is an advantage whenever 0 < f/f < 2. Table 1 shows the ranges of values of Bol B for 
I, € 1-1, 1-2, 1-5 and so tells us how near f and fj, need to be to avoid losing wanecitiod amo 
of information. gas] 


ootio” 
whio” 


unt 


Fisher (1935, p. 163) criticized the use of indices of response based on inadequate ass, 
ments of the relation between y and x. Tt is no contradiction of S" criticisms to con? je 
from Table 1 that, particularly if p is not near unity, f, does not am to be very near «id 
give a worth-while increase in precision. The use of non-linear indices such as y/2” ^ e 
curvilinear regression to be accounted for. Note, however, that if the treatment effec" PEU 
constant, independently of z, on the y scale, they will not be constant when the 1 ^ ay 
used; if the index and the original observation have equal physical significance there 
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wel 
Mte sa Vien E crac one rather than the other to be the appropriate scale for 
lign i: Sitiga e : oe E the object is the estimation of the effect on the 
a oat "ges naive use of ncm ot erige a. 
. Denote the terms of the analysis of covariance as follows: 
Treatments Tue T, T, 
Residual Ra Roy Bay 
where, for exan 5 n Sre Sen Sw 
xample, R, denotes the residual sum of products of v and y. If 2; is the mean of 


v for the i Ed 
ihe ¿th tr i 
1 treatment and f; the mean of y adjusted for regression on z, 


(6) 


203 


aveva - "P rep 
TT. 


ij E 
are in fact a random sample from a normal 


" 
This i 
Si P , 
tite aa on fixed 2’s; but the as 
ation ¢ : s 3 . 
1and the second term in brackets is therefore proportional toan F variate. Hence 


if 
we take expectati i n 20$], w i i 
ations and remove the factors 2o2/k, we obtain for the index of true 


imprecision 
ies thas = 
= ya ( 
"d p= 1 yT (9) 


ng for regression, are N-t-1. 

alues of t, k with N < 20. 

hen the effect of a is completely 
he residual regression coefficient, 
-up chosen to represent 


Since aks 
up inn degrees of freedom, after adjustit 
€ 2 gives the values of (7) and (8) for various v: 


The i 
"he iner i E 
; inerease in variance above that to be expected wl 


elin " 
ma 3 . ] 
ated can be described as due to errors m estimating t 
the linear set 


Or ag ari 
S arisi x à : 
sing from the non-orthogonality present in 
.,a,. In the absence 


NE po : 
Populations sampled. : ' l 

a fixed set of x values vy, -- 

., Brt €p where 


M 
9 Persa III. Consider first one block and l i 
Fis the p ony effects, the corresponding observations on y are foy +6, » 
egression coefficient of y on € and 6, ..., & are independent with constant mean and 
about its regression line on x. Hence 


Wit vari ^ 
5 ispersion of y 
the expected mean- 


ance og, and determine the d 
treatments T; and T}, 


ift 
Wo "NE 
tie ae are selected randomly to carry 
ifference between the resulting observations 1$ 


p "n Beem, 
cie Blue) = 9) PED) oe 
he 2’s, we have that 


He 
. B B H 
" and over the distribution oft 


> Averaging over the k blocks, 


2 

Ip = Let W, (9) 

Wher = -— ; 
"re W is the expected mean square of x within blocks, divided by the variance of x. 


———áá 
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We i ^ 
" need, in or der to calculate W, to consider the following. Take an ordered sample 
Ps bri 2) from a unit normal population. Divide this into blocks as described above e 
e mean square within blocks. Then W is the expected value of this mean square and 


can] i 
1 be calculated for N < 20 from recently published tables of the second moments of order 


Statist iesir o 5 2 gi i 
$1n anormal 8a: el w, 1956): ives some numer ical values de: iv 
rmal sample (Teichroe W, 19 56); Table gives ical values derived 


in this way. 
Nes. mp conclusion from the values of 
Method NE — II if p «0:6 and that Method II becomes appreciably better than 
arem Hu y when p is as large as 0-8 or more. Tt makes little difference if the comparison 
will be paie ingbead of on I,. In larger experiments with moderate and large k, both methods 
same ‘ive in reducing the value of J, to near unity, except when p is very near unity, 
eens = of Method III will be inadvisable. However, Method III will remain reasonably 
If the — any form of smooth regression between. y and z, not just for linear regression. 
their ia is linear, but the distribution of x is leptokurtic, the randomized block 
s likely to be relatively less effective due to the end-blocks having units with 


widely à; 
g- ly discrepant values of x. 
Method I V. The argument is similar when 


to tl 
he ex 5 : 
tole expected residual mean square in the two-w 
mus of the square, replaces W. Table 2 gives the value of W', and of Jf? in certain cases. 


f 
den have r squares, each of size t x t, the residual degrees of freedom are (t— 1) (rt —7 — 1). 
en the residual within squares and the treatment x squares terms are combined. This 


number i : 

ee is small in the cases examined. 
ey additional precision gained by elimin: 
€ of p at which Methods II and IV are ap 


M e 
h Sted V. This is the use of x simultaneous : z 
ero are two possibilities. We may analyse the design as a randomized blank ara ue 


varianca, estimating the regression coefficient from the residual line of the analysis of 
ih Variance, Equation (6) applies with R,, still defined as the residual sum of squares, this 
n^ in the randomized block analysis. We again require the expectation of T, Rags over 
he r: í E J 
randomization with the a's fixed ; 
—— 10 
BUT, Bex) > BTE Box) d xu (10) 


I® and If? is that Method III is somewhat 


a Latin square is used, except that W’, equal 
ay array of ’s formed from the rows and 


ating ‘order within blocks’ makes the critical 
lent equal to about 0:8. 


proximately equiva 
ly for blocking and for covariance correction. 


Wit 
h near equality in a large design. Hence 
1 
5) CTI 
Ii 2zltg-1(4-D 


ed as an approximatic 
hown by the expans! 


(11) 


on to J); the error will be of 


and the ri 
d Sight-kamdigidecof EDU übt on methods of large-sample 


€ ord, 
theory of [1/(£— 1) (k — 1)]*, as can bes 
andomized. This 


ffect on the y-values other than 
f the least squares model can 
In this case the argument 


e the design 38 if it were completely T 


TE 
16 second possibility is to analys 
t into blocks has no € 


Woul . 
$ : be in order if the arrangemen irre 
i ue to correlation with the 2s; and if the assump x. : 
Postulated, i.e. if we use more than pure randomization theory- 
(12) 


Para 
lel to (10) leads to ' mx 
pirg 


Wain wi , 
n with near-equality in a large experiment. 
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The numerical values in Table 2 and a direct comparison of the formulae show that ^ 
L3 the lower limit for /(? is greater than 7®. Hence, under the conditions posisie ; 
Method V is inferior to Method II. Even with the second method of analysis, i.e. with ees 
it seems unlikely that there is appreciable gain in average precision over Method LI. unde! 
the conditions assumed. 

Similar conclusions are reached from J®) and J”. 

Method VI. In this we consider a randomized block design with grouping based on & 
criterion separate from x. Quantitative investigation of this, based, for example, on ie 
assumption that x, y and the property determining the grouping have a trivariate norma 
distribution, has not been attempted. We can, however, deal with two limiting cases. The 
system of blocking may be identical to that based on x. Equation (11) is then applicable. 

Or the criterion for grouping may be independent of x, in which case 
jo UD 1-1 (13) 


(=1)(k=1)—2° 
For the smaller values of (1— 1) (E— 1) among the designs investigated, this is about 
1-15 I?), showing that in these cases the additional system of blocking should be included 
only if there is a reasonable prospect of a reduction of 20 ° or more in residual variance: 
For the larger values of (t — 1) (k— 1), If? and IP are very nearly equal. 
Method. VII. This method is theoretically the most efficient one when the observation? 
on y are built up of a polynomial trend on x plus treatment effect plus random error ° 
constant mean and dispersion. The method is best illustrated by an example: suppose that 
N=9, t=k=3, that a second degree curve is considered adequate to represent d 
regression of y on a, and that the values of x are in order 


—13, -0-9, —0-8, —0.5, —0-2, 0*0, 0:4, 0. 1. 


The most systematic procedure is to start by forming 


i l 
t first and second degree orthogo?® 
polynomials, £. £5, from these observations. If m, 


= Nat, these are 


E -c—mjm, | 


r MoMa — m,m Mı Mg — m2 
&-ai-( gta — Mi *) a+ ( 105 api 


MoMa — m$ My My — m? 


(14) 


The numerical values are 


Ej: = 1:13, -0-73, —0-63, —0-33, — 0-03, 0-17, 0-57, 0-87, 1:27; 
es 0-89, 0-08, 0-07, 0-40, 0-55, 0-56, —0-32. 0-07, 0:86. 


These are then normalized by dividing by J/(X£'?) to give č, and &, namely 


&: —0-50, —033, —028, —015, —0-01, 
£057, 005, — 0-04, —036, — 0.35, 


008, 0-25, 0-39, 0:57. 
— 0°36, —0921, 0-04, 0:55. 
We have to select from the nine units, three to receive TA. etc. Let X £1, Dyk, etc. denot? 
the sum over those units receiving T, of £j, &. This gives us six SEIN The treatm?” ' 
arrangement that minimizes the sum of squar : B 


es of these six numbers is very nearly: 
exactly. the most precise arrangement. Trial and error shows this arrangement to be 


€ —H3 —09 -08 —05 —02 00 Q4 gj ij 


n hhh RAD m 


adi. 
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(a non-randomized block design) with 


"E EC 


X£ -017 014 0-05 
X. 0-00 0:25 — 0-26 


The S : 
m of squares of these numbers is Smin. = 0-1811, and the value of J, is, in general, 


&pproximately (Cox, 1951) 


2) gis (1 - Su]. as) 


"TM EN mE m 
t mi k(t—1) min. 


Which in this case is 1-03. (The residual degrees of freedom 
5 if only linear trend is removed.) 

the optimum arrangement is tedious although it is 
that is nearly the best. If the degree of 
ually be possible to find an arrangement 
f this method is the lack of randomiza- 


would be 4 if both linear and 


To trend were removed and 
udo systematic search for 
the d to find quiekly an arrangement 
with I ein is small compared with k, it will us 
tion; the e igibly greater than 1. The disadvantage oft 

; method is of most value in single small experiments. 


1 5. DISCUSSION 
" deciding what design to use in à particular case, we should consider 

(i) the values for imprecision given above; r . 
(ii) the extent to which departure from assumed conditions is likely to affect (1); 
(iii) the importance to be attached to simplicity of design, and analysis; 
Gy) the extent to which considerations other than precision are relevant. 
The general conclusion from the calculations in $4 is that the methods based on co- 


ariance are preferable to the simpler methods based on blocking only if the correlation 


Coefficient between y and {x is at least 0:6. and that under the conditions postulated the 
Rarer edes ecise. For larger experiments all methods 


une DN Method VII, is the n pr 
rst are likely av year unity. " 
The main caste diem pnm are the linearity of the regression and the 
normality of the distribution of € Non-normality of x should have little effect on the 
E ciency of covariance analysis. while I for the blocking methods will usually be e m- 
“easing function of the kurtosis of the distribution of x. If the regression 18 ode n but 
2 ethods will not, unless 


Smo M r "ance Mi 
oth, blocking methods will remain effective, while covari 


he li i . multiple covariance used 
Mear co: f „egression, or mur d 

m d . of the rei x à 
menearen —_—— Method VII. Details of analysis are, of 
aple ex 


he 3 
Cours methods of design are all sin ; 

I 5; simpler for methods not involving ogvarjano® es lation between y and x may be 
of i ere are two further considerations. The form of the relat ntal material or in giving 
in s irinsio interest, either in helping to underst and the m s pet TRES ES 

ormatj ‘ «ments. Also we ma 

10n useful i ion of further experi „interaction. Such an 

Ment mm S si ya reat re isa treatment X v iim : : im 
Mteracti are not independent ot X, ; a]ving the treatment ettec s 
ane aiian may give T insight into the mecha : pue 3 aon he cgi 
May also c : tions to a die 
If y also change any practical yecommme! «pt Worse quantitatively. 
"Met Eidos dre isa we shall normally P7 efer 

a Je 


y helpful discussion. 


I 
am grateful to Dr B. G. Greenberg for i 
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APPROXIMATE CONFIDENCE LIMITS FOR COMPONENTS 
OF VARIANCE 


By M. G. BULMER 
Unit of Biometry, 6 Keble Road, Oxford 


1. STATEMENT OF PROBLEM 


mp. that M, and M, are independent mean-square variates with f, and f, degrees of 
Sica and unknown expected values (04-0?) and c? respectively, where 6 and c? are 
with f gative. In other words, f, 14 (0 + c?) and f; Mya? are independent chi-square variates 

1 and f, degrees of freedom. The problem considered in this paper is to find confidence 


limi 
mits for 0, the difference between the expected values of the two mean squares, that is to 
Wh, Ma); such that, at least approximately, 


Sa; 
Y, We want to find a function of M, and IM, f 
Pr [fM Me) <0] = % (1) 


wh 
atever @ and c? are. The function, f, will of course also depen 


sh; s 
all bo interested in the cases when æ is either small or large. 
of the time and f 


EE esto *0 « f (I, My)’ will be correct 95 % 
aang limit for 0; if, on the other hand, & = 0-95, the 
he pu te 95 %, of the time and f will be called the lower m 
Sucl erval between the lower and upper 973 96 limits isa 95 E 
Which A confidence limits will often be useful for estimating the us 
arise in the infinite model of the analysis of variance. Approximate solutions of this 
Worm have been given in four recent papers (Bartlett, 1953; Green, 1954; Huitson, 1955; 
in elch, 1956). A different approach is adopted in this paper which leads to a solution, given 
ag, ion (6), at once simple and reasonably accurate. In the second half of the paper, the 
curacy of this solution is compared with that of the other approximations which have been 


Suggested, 


d on the value of a, and we 
Tf a = 0-05, for example, 
will be called the upper 
assertion ‘0 > f (Ur Mo)’ 
9/ confidence limit for 0. 
confidence interval for 0. 
mponents of variance 


y THE PROBLEM 


TE SOLUTION 0 
M), can be wrt 


co limit fa f. tten in the form 


9. AN APPROXIMA 


Tt ds 
reasonabl fid 
Mg p e to suppose that the confiden! : lysis aro multiplied 
hw a i x ;ons underlying the analysis are iplie 
here F = Mj[M,; for if all the oema Tied by c and the confidence limit should 
P the result follows 


Y à const; 
ant, c, then M, M, and 0 are each mu i 
e : , " 5 — 
i we tipli by c? nd. "Therefore felh, e 2) = Hp pe ee i 
© Put e = 1/M,. We must therefore find a function, f: such 
Pr Mg) <0] =% 


(2) 


Whatever g a 
c? are. But -1(01M;)]. 
E Pr [M,g(F) <0] = Pr ur «ql Mj] - P" d : oe and increasin 
Where gis the fida function of g; itis assumed that g(F) 38 9 y eie oe «cd 
minm, of F and sw s paren exists. ow write m d E f and f “degrees 
T Ó|6?, so that m, and m, are standardized mean-square yan» z g 
Mom ‘Then mag toi) 
-3(9/ a) = PF [ms (4D F 


Pr [M < Msg 


P 


ka * 
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and we want to find a function g which satisfies the equation 


Pr IDE me = É BE [ea ay psa, =g (3) 
pz 0 de 


for any p, where P, is the cumulative distribution function of a — nee 
variate with f, degrees of freedom, and p, is the frequency function of a stant ar os pet 
square variate with f, degrees of freedom. If this can be done, then Mag(£^) wi 

limit for 0. 3 
uc limiting conditions can be placed on g(F). Let us write L, for the lower w-— 
point of the P-distribution with f, and f, degrees of freedom and L for the correspo limit 
lower 1002 % point with f, and co degrees of freedom.* Then it is obvious that the 
M,g(F) should be zero when F = L,. Also g(F)~ F[L, as F —co, since 


Pr([JMF[L,« 0] Pr [MO < L]]--x as po 
and since F will almost certainl 


Thus we 
y not be very large unless p is also very large. Thus 
want to find a function, g( 


F), which satisfies the conditions f 
g(Ly) = 0, j (0 
g(F)- F|L, as F-co. 

Any function satisfy 


" =0 
ing these conditions will give an exact confidence limit when p 
and when p->oo, 


The simplest function satisfying these conditions is the linear approximation, g(t): 


(B) 
gun) = (F-— L| Ls. 

This is also an exact solution of the problem for 
for values of p between 0 and infinity it gets w 
The adequacy of this approximation was test 
with gj substituted for 9^3, and both the lowe 
wide. I therefore introduced a term in 7-1 


oaii 
all p when f, is infinite (f, remaining finite 
orse as an approximation the smaller f 2 1 
ed by direct numerical integration O 400 
r and the upper limits were found to be 
and tried as a second approximation 


(6) 
(FP) = T =] 26- 2). 
Da 1 


«a tio? 
d the linear ee ) 
error (P—«a), where P = Pr [Maga zap?” 
is the one considered in the rest of this I in 


} al 
sometimes be obtained for 0. It is suggested th - 
this case the limit should be taken as zero. If this is done the u 


ans 
h find a confidence interval for 0 by me 

ed significance levels. 

* It is important to notice that w 
then € = 0-95 and L, is the upper 5 
are considering the upper 95 % limit for 0, then a = 0 
degrees of freedom which is the same 
of freedom. Similar remarks apply 


hen we are consid 


ering, 
96 point of F wi 


f. 

aig fOr 
for example, the lower 95 96 eren pa 
th f, and f, degrees of freedom; but that "ind E 
*05 and L, is the lower 5% point of F with Jt egre? 


[ £ 
as the reciprocal of the upper 5 % point of F with f; and Ji 
to La with oo substituted for fy. 
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AS an exa i i 
M "— of this method, consider an experiment quoted in Davies (1947). Si 
side penicillin were each grown in turn on the same set of twenty-four Petri di a 
* SDN fe hes, 
Sti t meters of the zones of inhibition of bacterial growth were measured in milli 
. e n ^ r differ i A 
T tie "s an square for differences between dishes was 4-61 and had 23 degrees of 
and the error mean squar E vi 5 i meets 
the dishes as a : an square was 0-304 with 115 degrees of freedom. If we can regard 
à random sample from a normal universe of dishes with variance oh, then 


0 = B(M) - EUL) = 605. 


The lo 
wer —" " 
fuistis he upper 97-5 % confidence limits for 0 given by Jhg,(F) are 2-48 and 8:80 mm.? 
ely; thus the 95 % confidence interval for oy is 0-413 to 0-918 mm.?. 


3. THE ACCURACY OF THE APPROXIMATION 


Psp 
= Pr . 
Uf bats Lgs T ) <0] is exactly equal to æ when p is zero or infinite or when f, is infinite 
values E nite). The purpose of this section is to study the behaviour of P forinlermedista 
P. If we write r = fj/f;, then we can show that, for large fı and fo, P depends only 


9n p ang » 
We des r and not on the actual values of f, and fy. In fact, if we iat sll 
P = Pr[z <0], where 


my(p +1) 


a= Tt (pt Le 


tant, then we can write 


ij go d 
mils f- z)-^ (7) 


If w 
el 
et f, and f, tend to infinity while r = filfe remains cons 
1 £f i 2/95 and I4 —1 - £ Ifi), 
a = 0:05, for example, 
becomes 


" Ly = 
here £ ; 
ere £, is the upper 100x % p 


am " 
"lih 6449, and when a = 0:95, 
Ximately normally distributed with mean 


oint of the normal distribution; when 
E, = —1:6449. We can therefore see that z 


2\% 2 + 2f) — IA 
gfe +1) (5) iier en, " 
and vari i 
ance (p+ 1)? (2/f1) + (2/f;). Thus 
(9) 


: - 
E(z) in OFS e zl : 
78er (p? (p+ 
y-a +r)i+1. Thus the extreme 


For 
fixed » th; 
ed 7, this quantity is at its maximum when (p! 


Value of p 
fp 18 about e(l +r)ł-—(1 ze (10) 
Re ae an 
vi ett T 
here f 
(&.) is t ; «vo at £,. Thus we can calculate that when f, 
he ordinate of the normal curve mi rs account Teihana aig 


ted that (P — &) is of order 7?. 
arge. In order to investigate 
+ alfa + tolf? + It is 
can be calculated. The 
aken over all possible 


ar a 
Maxima here anda = 0-05, P lies between 0°05 ane" 
quas; T and 4 are 0-05087 and 0.05026. It will bene 

ls on (10) gives us the extreme values of P when fi i5 


lat h ! 
: D. w When f, is small I have expanded P in the form & 
all p. It is also shown how o 


Wn in 

Maxi the Ap D ix P 

m pendix that o, = 0 for ee " 
hs aes ot as Siima values df aeti uu ped table gives a good idea 
of th » are given i . several values of J1 ARS A à 

[A giver a 1, for several Vë = : 

behaviour E A in Table ^ SE P for all values of fy When fi is large. formula (10) 
the extreme values Biota: aie 


S 


a 


lr 
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can be expanded to give maxc (for a< 1) or minc (for «> 1); in either case the piner 
extremum is zero. In the case «<4, minc is zero for all f, and max c increases steadily to 
its limiting value as f, increases. In the case g > 1, the position when f, is small is the reverse 
of that when f, is large; minc is zero and maxc is positive (and is larger the smaller f, is): 


For moderate values of f, both the maximum and minimum values of c are non-zero. 


Table 1. The minimum and maximum values 100c over all values of p. 
Min 100c for æ < } is always zero 


f a Max 100c a Min100c | Max 100c 
| = 
2 0-01 0-00 0-99 0 | 4:38 
4 01 -01 -99 0 1-77 
8 01 -06 -99 0 0-70 
12 01 -09 -99 0 0-40 
24 01 14 -99 — 0:05 0-13 
| 60 -01 17 -99 —0-12 0-02 
120 -01 19 -99 -015 | 0 
oo -01 19 -99 -019 | 0 
| —L- = 
2 0-025 0-00 0-975 0 5-84 
4 :025 -02 -975 0 2-38 
8 -025 ‘ll -975 0 0:93 
| 12 +025 17 -975 — 0-02 0-51 
24 -025 -25 “975 — 0:13 0-15 
60 +025 “31 -975 — 0:26 0-01 
| 120 :025 -34 -975 — 0-30 0 
| «2 :025 | -36 ‘975 | —0:36 0 
5 n I 7 z 
2 0-05 0-00 0-95 0 6-50 
is 05 02 95 0 2-60 
| $ 05 16 95 0 1-01 
E 05 -25 95 — 0-08 0-53 
24 05 37 -95 — 0:27 0-14 
80 305 46 95 — 0-42 0-01 
120 05 -49 95 Lian 0 
b “08 58 95 —0-53 0 


s t 
If we write c, for one of the extreme values of c in Table 1, then the Taylor expansio? "e 

the extreme value of P would be E Cor? .... However, as f, co, cot? E, f) 72/32 | pt 

equation (A 10) of the Appendix), whereas t g 


he extreme value of P tends to (10). We nos 
therefore hope to get a better approximation for the extreme value of P from the ed" 


- ssi 9 (i+rt-(1 a, a) 


r 
e 
pa” 
gether. The extreme values of P Jue? 


A i 3 in some cases, and these exact Y d ws 
are compared with the approximations given by (1 1) in Table 2. The approximation Be~ g 


: a 
to be fairly good. It can be concluded that P will not differ appreciably from o in any ^ 


which reduces to (10) when f, and f, tend to infinity to 
been obtained by direct numerical integration of (3) 
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likely Wh : 
E «d te be met with in practice, and that for any value of f, the accuracy becomes better 
pidly as r becomes smaller (roughly in proportion to 7°). 


Table 2. Comparison of the extreme values of P obtained by numerical 
integration with the approximate values given by equation (11) 


A | fa | a Exact range Approximate range 
| | —— — oe _ 
8 6 | 005 0-0500 —0-0514 0-0500 —0-0511 
8 2 -05 -05000— -05048 -05000— 05041 
2s 24 (5 | -0500 — -0520 -0500 — -0518 
a 48 o | -05000— -05064 -05000— -05061 
2 6 95 | -9500 — -9550 -9500 — -9554 
2 12 +95 -9500 — -9515 -9500 — -9515 
8 6 -5 | -9500 — -9586 -9500 — -9571 
8 12 -95 -9500 — +9530 -9500 — -9526 
24 24 -95 -9491 — -9511 9487 — -9507 
24 48 -95 | -94962— 95030 -94956— +95023 


7 4. COMPARISON WITH OTHER APPROXIMATIONS 

^ m is an unbiased estimator of 0 with variance 2w, (0 + c??? + 20207, where w, = If: 

ap Ma = l|f,. If we estimate (0--0?)* by M3 and o by M3 and regard (M, — M) as an 
Proximately normal variate, then an approximate confidence limit for 0 is 


i M, — M, + £, (2w, Mj + 2ws M3) = MF -1 HE (20, FP + 2w3)], 
w : à 
pa Ša is the upper 100% % point of the normal distribu’ 
a and will be called the first normal approximation. Welch (1956) has developed a 
tes solution, analogous to the Cornish-Fisher expansion, for the general problem of 
Ca, ding confidence limits for linear combinations of several variances. For the special 
i $ of the difference between two variances considered in this paper; "n first approximation 
ese first normal approximation given by (12); his second approximation, which will be 
*d the second normal approximation, has an error of order (21; Wa) and is 
s i im | 13 
adr -14£Qw, F^ 20, 8€) (w, 2+ we) J’ on 


hird normal approx 


(12) 


tion. This approximation is well 


8nd hie +r imation, has an 
e Ae third approximation, which will be called the t 
of order (wf, wi) and is 


^| 


2 
(u P919) 4 9(200, F? + 2002) + 


PALE (2w, F+ 20) + 8082 D (5, FEF ws) 
2 F3 — wy)? 
og FS ud) 23 eir (e 
x | "Etre + HOE e EN a -iu05-? Ea) (v, F3 + w)? 


to Bartlett’s (1953) approximation to the 


€ se 
Order af nd normal approximation is equivalent 
>t magnitude to which it is correct. 


u , , 
Con, Fen (1955) has developed an alternative erp rage) has Veo datio Ss 
imi i inati "jances. à 
Alte, hir. limits for linear combinations m p a itat wa WE je wih 
€ expansion by a slightly different apP =e 


for the general problem of finding 


164 Approximate confidence limits for components of variance 


the problem of estimating the total variability (that is, the swm of two or more variances); 
and his method seems entirely satisfactory for this purposes. It is not satisfactory, however, 
in the case of the difference between two variances considered in this paper. Huitson § 

expansion, when applied to this special case, becomes 
O M(F-) 0 "T 

1 — £, (2w, F? + 2w,)! (F — 1)! +... ` 

Tt is easily seen that this is an unsuitable approximation near F= 1. For consider (15) 
taken as far as the large sample approximation (i.e. as far as it is actually written explicitly) 
The limit is zero only when F = 1, which contradicts the first condition of (4). Furthermore: 
the upper limit is infinite for some critical value of F, say Fy, greater than 1 for which the 
denominator becomes zero and the lower limit becomes infinite for another value of F , 
say F,, less than 1. When F, < F < Fp, the upper limit is negative and the lower limit 15 
positive! It is true that (15) leads to Welch's first expansion if the denominator (i-M7 


is expanded in a formal Taylor series; but, when F is near 1, h becomes very large a ud the 
Taylor expansion diverges. 


Table 3. The limiting probabilities that 0 is greater than the three normal 
approximations as either f, or p approaches infinity 


= pee 
| en REA | Second normal Third normal 
f — | approximation approximation 
i | 
A | | 
2, — 0-05 “2=095 | a«=0-05 a=0-95 a= 0-08 | a=095 
| | 
D — |! = T | | oa 
| | | 
d 0-11392 099266 | 006303 |  0-92551 0-05306 | 0:95550 
120 pe 98377 -05688 .93931 | 05120 | 95188 
240 | A 97544 -05359 94513 05045 195063 
480 Eres beim :05186 :94710 -05017 | .95022 
960 irs prie -05095 -94889 | 05006 | -25007 
" THAN "05049 -94946 -05002 | .95003 


The accuracy of the three normal approximations (12), (13) and (14) depends o" bot 
f; and f, being large; the accuracy of my approximation depends mostly on the ratio f lf 2 
and does not get any better if they are both increased together. We ie investigate p 
accuracy of the normal approximations as either faorp approachesinfinity as follows- wher 


f: >œ, the first normal approximation (12), for example, becomes M, — o? + E mw} € 
1 [A 
P = Pr +E, Mw) < 0 +0?] = Prim, < (1 £ (2)])71] 


A " f js 
where m4 is a standardized mean-square variate with f, degrees of freedom. When fs L 
finite but p — oo, terms in (12) not involving F can be ignored and 


P = Pr[M, +E, M(2w$ « 0] = Pr [m, « (14- £ (2w,)*)31] 


> 


t Sa ET 1 1 
as before. Thus the limiting probability as either f, or p tends to infinity can be obtain’ 
from tables of the chi-square integral. The limiting values for the second and third nor 
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approximations " i MODUS i" 
10N; corresponding to the reciproca: re} £ (Zw. y are easily found t be th. 
flc asl o be e 


reciprocals of 


1+€ (2w y 2 2 
z 5. 32(2£2 .- 1) we E (941p, Y 2(2¢2 7 
gem s(252 + 1) wy and 1+&,(2w,)? + 58 1) wy gg (i3) (138 + 178, 


= 0-05 and 0-95 are given in Table 3. 


respecti 
jlVi 4 oy H iliti 

ely. The corresponding probabilities for 2 
approximations if we confine ourselves 


W ; 
a rer = andateate the accuracy of the 
ice estan sees Thus the first normal approximation is rather bad even when f, is as 
exceptional M second normal approximation will only be better than mine under rather 
When f, is large n (the maximum deviations | P—«| for my approximation are, 
tor })."The thin ms acu e M 0-00252, 0-00087 or 0-00026 when fi/fs is 1. 
Whether the ies nor mal approximation will often be better than mine, but it seems doubtful 
een 2 » = accuracy is worth the increased labour of evaluating it. 
way, althou t xd i Huitson’s approximation (15) for large p can be investigated in the same 
turns out a ne argument about the limiting probability as fa no longer holds. It 
at if we take Huitson s approximation as far as the term of order (wi, ud), then 
Table 4. The limiting probabilities that 0 is greater than 
Huitson’s approxi mation as p approaches infinity 


| Huitson’s first Huitson's second | Huitson's third 
i approximation approximation approximation 
Si | 
i e "S A | — | 
a = 0:05 a = 095 a = 0:05 a = 0-95 a = 0:05 a = 0:95 
4 0-03066 0-93826 0-04823 0-95107 0-05009 0-95004 
120 :03725 .94105 .04919 -95058 95002 | “95001 
240 | 04145 -94334 | -04962 | -95031 -05000 -95000 
aso | 06 ‘94511 | 0498? -95016 05000 95000 
900 | :04598 94645 -04991 -95008 | 05000 :95000 
L 4790 | 74 «04995 -95004 -05000 -95000 
: | £ E n | 
as 
P> g : s s A " 
9o, P Pr [m, « C] for all fo, where C is the Cornish-Fisher expansion for the per 
p These limiting probabilities are 


onding term. 
roximation is considerably 


Tt might, therefore, be 
l approximation for 


Cent. 
give 


pim of m, taken as far as the corresp none 
etter th able 4. It can be seen that for pco Hui an rem 
Ossiblo in the normal approximation of the correspon’ nee n id 

all p riri a much better approximate" by using the norma 

it : roxi i r large £- 

For aniels ku tee Sn adii = P f the first normal aed (12). 
hu ? points out that M? and M$ are n timators of (0+ a°) and gt, whereas 
; P :: that the sampling variance of (M, — My) 


(f. 
Sh, XN, 1+ 2)-1 and r2 ata ares hogt ests B "s 
T *- be s in ipi «on i42) instead of by 2Mifi 14 2M3fa i 
dig St normal appr M Ihe ee however so bad that Daniels's modification makes little 
erence al approximation I$. ho , 
Jteen (1 . 
do ed (1954) has developed approx 
Until i In this paper. His solution. 


as been tabulated. Green £ 


S 


like the one 


ya method more 
application 


d is incapable of 
y of his approximation, 


ives only 
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that P = 0-97492 when f, = 8, f; = 50, p = land æ = 0-975. From equation (11) it can be 
calculated that P for my approximation varies between 0-97500 and 0-97520 when f; = 8, 
fo = 50 and a = 0-975. 


APPENDIX 


© 3 (pim, 
P zl pe ed | PaM) dma (A 5 
0 p+1 ` 


We wish to expand 


in the form P = a+a,/f,+o/f3+..., and in particular to show that œ, = 0. 
Now, solving equation (6) for F in terms of g,(F) = v, say, we find 
F = gg!(o) = 3L,(1+v) + [EEAQ +v)? — L(a — L4. (42) 
(The positive square root must be taken since v = ga(F) ~ F/L, for large F, or F ~ Lav for large v) 
Expanding (A 2) in powers of 1/f, we have 
L MÁU +0)? + (L/L) (2o v?) 
fal +) f80 v) 


where L = L,, L’, L” denote the successive derivatives of L, with respect to fy} at fy = coc. Writing 
M, = 1+ (so that the expected value of « is zero) we now find, expanding in powers of fy" and z, 


gz (v) = L0 +0) + 


+05 ) ae 


masgg*(p[m,) _ L xl 1 1+2p pix? | 
- = e+ + e+ 
pl lp faLO+p Qo ^ (+p) 
1 p L'*(2p- p?) A4) 
+ n: A -3 f- —143 ( 
n E Lüxpy | E a Sg mh 
and, expanding P, about L, we find 
-1 
msg; (p/ms) pL pL p" Là "LA 
al 22a =a+ z+ og Ms 3 p 
EZ 1xp^* aapt * ppp * ap 
RT pE | pL/(2p)--p'LL' || pl/p?- p LL'(V4- 2p) +4p" D4L/ | 
— - sp = a = sd 
fs LU. +p) (1+p)* (1+p)! 
1 [ ph" | ph’*(2p+p*)/L+4p'L* a Ab) 
uo. (+p) Jeou “fete fz aa) ( 


Here p,p’,..., denote p,(L) = P), pi(L), -+, and we note that P,(L) = a. Substituting this expre? 
sion in (A 1), and noting that | 2"p,(m;) dm, is the rth central moment of Ma, we find 


pL*pL/ 


= 1 n" 
TUM fol+p)? ^ f20 +p) Sp" D'Q +p) + kp" D^ + 2pL'p? 


^ ^ a” + " 6) 
*3p LL 3p) + p"LAL! + YpL"(1 py - (pL? L) (2p p?) +p r+ A 
This expression can be thrown into a more convenient form b. e 


e 
: ? á s y replacing the derivatives of the P 
centage point L,L’,... by their expressions in terms of D: ^, .... These relations can be obtain? by 
expanding the expression J 


J , Pam) paloma) dm, = a o 


in powers of 1/f, and equating coefficients. The same result can be fo 
(A 6) (when the left-hand side becomes exactly «). Thus, 
in (A 6) vanishes identically, and we have finally 


oso c0 dE 
und simply by putting p of ifs 
as remarked previously, the coefficient 
727,3 
p 


agg, FETA dps » 
+( ap" Lt+ $ UNE (p+) | «ous i k 
The quantity ¢ defined in §3 is the second term on the right of (A 8) with the fi 2 laced d 
: tor 1/f2 reP p 
uf Te will be recollected that L = L, is the lower 100% % point of an e « with f dd 


1 
P=a+ ——— [ — $n" L*(p + p?) — 2p' Lp? + & 


Ji py Cpto) 


o 
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ceo TN 
n d om, or of ay fi distribution with f, degrees of freedom; p. p. ...represent the ordinate 
ba inser uon the point N/A E L, and its derivatives with respect to L. The latter quantities can 

Now enm odie » uated using Molina s (1945) tables of the Poisson distribution. 

TI iwi den a er i! li case where fiis large. Then the above y?/f, distribution becomes nearly normal, 
Kano eade and variance 2 lfi, so that L is approximately 1—£,(2/f,)', where £, is the upper 1002 96 
deviate. If we write /(£,) for the ordinate of the standardized normal curve at £, we have 


approximately 
p= XE. PO = GI HIE 


Iaking these substitutions in the expression for c provided 


Where 
bw the H, are Hermite polynomials. M 
8), we easily find 12 2 
.13ptPp -i 
°= 3 (+p) ESED OU") 


(A9) 


=/2-1, when it takes the 
(A 10) 


an ex 2 
Pression also obtainable from (9). This function has a maximum at p 


Value 
maxe = def (E) (fu large), 


Bats, $ 
indicated at the end of $3. This result can also be obtained from equation (10). 


a should like to thank Mr A. M. Walker for helpful discussions; I should also like to thank 
© referee for a valuable criticism of the first draft of this paper. 
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MULTIPLE RUNS 


By D. E. BARTON axb F. N. DAVID 
University College, London 


1. The number of ways in which r, elements of one kind and r, of another can be arranged 
in a line to form a sequence of 2t or 2¢+ 1 groups was solved by Whitworth (1886, Problems 
193 and 194), and it was probably not new to him. The solution was revived by Stevens 
(1939), Wald & Wolfowitz (1940) and Mood (1940). Sequences in which there are k different 
kinds of elements have also been treated. Whitworth takes three kinds of elements with the 
same number of each while Mood obtains a general solution for the distribution of the 
number of runs of one kind, given k types of element. Using a simple generalization o 
Whitworth’s method we show here how the distributions of the total number of runs ca! 
be built up for the multiple case from that for two alternatives. The method of the charac 
teristic random variable is used to obtain reasonably compact expressions for the moments 
of the total number of multiple runs. The assumption of normality for this total number 
is not discussed by us at length, since it obviously follows from the work of Wald & Wolfowit^ 
and of Mood. We show, however, an alternative limit which is Poisson. 

2. Starting with the two alternatives case it is assumed that there are r, white balls and 
Ta red balls. If they are arranged in a line randomly, then 7’, the total number of grouP? 
(or runs) will be made up of ¢ white groups +t red groups, or (t+ 1) white and / red. O" d 
white and (f+ 1) red. The number in the fundamental probability set is 


"n 
C. 
where 


r= ritig 
The probability distribution of 7' is 


PT = 2i} = 2571€, IG_,/'C,, 


and PIT = 2141} = PIT = n. 


Now let there be r, white, r, 


red and 7; black balls, and suppose it is required to find, tP° 
probability distribution of A TEENS 


T = t tst ls, 


where f, is the number of white groups, t the number of red and tthe number of black group? 
respectively. The number of ways in which r; (i = 1,2,3) can be split into £; groups is 


3 
so the total number of ways is ]] 5836. 4 


We now consider the t; as units and look for the number of w 
arranged along a line so that no two like colours are together, Following Whitwort f 
take the t, and ts groups and put them down in any order. Suppose that there is a tot? z 
x contacts of the same colour (RR or BB). There will be tz +ts—1 total contacts so that 
number of RB or BR contacts will be f,+4,—1—2x. Now take up the /, white groups. PTS 


: : ; can be 
ays in which they " 
hw 


he 
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them will } 
have to go betw 
BR and go between the RR and BB cont i Wi 
: . : contacts leaving f. t 
RB contacts or at either end of the line. This can bs dons t pe 


tyHy+1-20, 
1707 


Ways. TI 
: the nur F rays 1 i 
Bitttapks is br of ways in which t and łą can be arranged to produce f,+f;—1—2 
(say) Mos à cun of ways in which t, and f; can be arranged to produce f. +t i 2=¢ 
» A - E H : V AL 
quoted above x number of ways for this, depending on whether G is even or odd, is 
. The total number of ways of producing 
T =t +t +t 


will be 3 
then €— 
Y I] 76,4 Sete = Alat 
Whe: H ' Ca j 20, -[P{C nmi ty + tg — o) 5950, ], 
rethein 
ner pads ; 
"Partitions e T is taken over all possible values of x and the outer sum is over all possible 
coefficient ial : There are restrictions on both x and T in order that the combinatorial 
tndament, all have sense, but these are obvious in any calculation. The number in the 
al probability set is 
r! 
3 7» 
Tr! 
Where . it 
Or is thes " ai : isi 
ratio of ü e sum of the r;. The probability of obtaining à given number of runs is just the 
3. T ne two expressions. 
THAM le distri : à " i 
di; listributions for four and more runs can be built up successively from the 
ith r the sum of the r; Let there be 


Stri uti 

s it E 

is] - for three. Let there be r;(i = lL 2 vk) W 

>++.,k) groups respectively and suppose that 
k 


p(T) = PY i 


ls y 
*quireq Th E 
e number of ways of forming T' groups 1$ 


Il riya 
=1 
Arrange i kai 
T'-Xu-7 Tat 
i=l 
e. v is the total 


acts of self-colours, i 


e there are q 
s which so far 


theremaining 
e number of 


ont 
— ] contacts 


nH an 

Y Y rà 

E or ee order and let there be * total ¢ à 

ha al er hite-white, red-red, black-black, -- contacts. Sine 

m e will be 7" — 1 —x contacts of different colours. Take the ty group" 
andarrange 


o 

(M een arr: P tacts 

Ways In the 7” rranged. Put vof these m th lour radi of the lino. Th 
8 — 1 — x different-colour cont e ' 


exself-co 
acts, or at th 


of do; 
oing this wi 
The *" is will be 243-20, a 
ton, , Mber qa groups is obtained 
he dm ways in which 7” elements can be arranged to form T' — grouk 
ribution of runs of k—1 colours and we have 
k-1 
í yil k-1 IU 
PET api iH1 E Sha 1. €f 41-00) P| "= Hua m , 
: t! 
j=l 
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here the first sum is over all possible k-partitions to 7} and the second sum over all bw 
: "ie restrictions on both x and the partitions of T, become obvious in gli = "à f the 
PORIA can be calculated surprisingly quickly. Using the formulae of this an - ava 
preceding section the distributions for r = 3,..., 12 and k = 3,4 were found and are g 


‘ ight of 
in Tables 1 and 2. More extensive calculations did not appear profitable in the lig 
approximations discussed later. 


' rom first 
4. The moments of 7, the total number of runs of all colours, follow directly ae 
principles. Define a characteristic random variable c, which has the property that i 


are 
unity when the balls on either side of the tth gap are the same colour and zero when they 
of different colours. Define S by 


ri 
S-r—-T = Ya. 
t=1 
The moments of S can now, in principle, be written down. Thus 
2s Arn 1) 
^60 - AUT) 
1k 
and 6(8) == Err- 1). 
T i-i 
k 
We shall write generally F,= D1, 
i=1 
so that in this notation r.é(S) = F. 
fr—1 2 =1 
Again é[S(S—1)] - & [( > a, =F a, : 
(=1 (3 


Because o, is a characteristic random variable 


Elat) = E (o), 
but it will be necessary to divide the double sum into two parts: 
r—lr—1 


T—l tei tt | 
EX Dae, = 6 30,044 4-d Y, Xa. 
t=1u=1 t=1 t-1o-1 
tzw lt-u|>2 
The expected values of the products are 


= oy iri 1) (n, 2) 

E (atiy) => T(r-1)(0-3) 

=, 6 Dr - 1) 
Ela) E -De-A +> 


and 


there being 2(r — 2) 
products in which 


1® Ig (S) = R(F,—2) op. 

By a similar process we obtain the third and fourth facto: 
79 ha) = E((F,— 2) (F, — 4) — 61,5 — 3) 
19 ugy(S) 


rial moments of S which SN 
OF, - 107, 
7 AGA 2) 054) 05-6) 2E, — 8) (m. 5) 4 aor d 9 

+ 405,0, — 3) 12/05 — 3). 84F; — 296p, — 1207. 
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1 E Two approaches to the limit are possible. First fix k, the number of colours, and let 
; the total number of balls, increase without limit. Then we have that 


S— Fr 
os 
is j Hm x 
m the limit a unit normal variable where 
PEN NN 
. rr—1) r*(r—-1) rr- 1) 
for runs of one colour. As an alter- 


ie. we can put an upper bound on r;, 
proximately 


This 
ee result follows by generalizing Mood’s procedure 
ete we can fix the possible numbers of each colour, 
let Æ and therefore r tend to infinity. Under these conditions we have ap; 
$ Hat) | yen 
aor Fh ul, feet (Gh) ? 
Si. l 
E E other sums in which some or all of the subscripts are equal will be of lower order 
f, » : 
Ccordingly if we put Nes 
the 
i Hw) (S) >A", 
i i i . . S, * 
ns is the wth Poisson factorial moment. This indicates that S tends in the limit to be 
o "ibuted as a Poisson variable with parameter A. A rigorous proof of this can be given 
psy E the lines already set out elsewhere for moments of a similar structure (Barton, 
nu 7). We have therefore that as k (and 7) increase without limit the distribution T the total 
As er of runs tends to that of Poisson’s binomial limit with parameter F,/r. In practice 
lin 2s 
mit will not be reached until & is large. 


6, Following Aitken (1939) we define the factorial cumulants xo by the relation 
t A0)! 
s = S —Kü 
K, = 7 i 
wl : Es i 
lere y i . e are 
"e &iis the ordinary moment cumulant. The first four of thes 
i 2o ka + 6ko + TKot Kw 
Kmr Ky = Ky kgs Ka = Kart Dra T Fr ky kat eg REUS 
^n A s of S: 
™ these relations we may find the factorial cumulants of 
-———— 
K,(8) = * , K97- (r-l!) 
orf, Fy + 28s 10772]. 


: [4F$— 12:3 4-8. 
e TN EE 
8 t 
Xpression for ky (S) is lengthy an d we do 
A- Ri ^7 Ful” 


Kg (S) = 
not reproduce it here. Write 


Th 
I 


Whence eon 
“Us , 1 [49 — 6A,(0A — 3) + 100 ;* AM 
hn Ralf} = 2 a Kols) = ger») 


t= 
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These cumulants simplify considerably if we let 


i = A+) 
for in this case Ay = AED, 
Thus for equal numbers of each colour we have 


j Ae 
E inn M d Jeanne] 
sm. Kin Ty ps Fut neca fe ope -30-5 ges 


If we consider a simple binomial expression, say 


(q-- py, 
where A=rp, 
the factorial moments are 
2 208 6A4 
KL=A, Kg ae > ;g* fo a 


: ag the 
It will be noted that the leading term in the factorial cumulant of S is the same aS 
factorial cumulants of the binomial in the four cases given. 


e 
7. The similarity between the first four cumulants of the distribution of S, when ne 
numbers of each colour are the same, and those of the simple binomial, coupled with 
fact that two possible distributions under conditions analogous to those of proceeding 
binomial limits are the normal distribution and Poisson, suggests that a binomial distri 
tion may be a useful approximation to the distribution of S. The case r = 12 with fou" 
colours, each three (R) in number was considered. S can take values 0, 1,2, ..., 8, and 


|)» " (r— 2 29m m EE R 
nS) == Raa, Ky(S) Fi Fr—-3) 27 (R — 1) (r — E) 


r(r-1) r(r-1) e=) y=] ` 


" , du the 
Three approximations to the distribution of 5 are now considered. First we let 
binomial index equal 8, and fit the binomial 
(+p) = (15 

the ‘p’ of the binomial being found from 
8p =k, = 2. 


e 
This is approximation I of Table 3. 


x 
Secondly, we equate the first two moments of 5 


40 
FR — Y and (£— 1) (r— R)/(r — 1), to the first two binomial moments, which is equivale? 
taking 
j R=1 2 i; 
pe YD "etes y. 


ye 
This is approximation IT. Thirdly, we may calculate areas corresponding to a normal oF ad 
with the correct mean and variance of S. This is approximation III. The approxima we g 
; the normal curve area in the first group repres? gw? 
the whole left-hand tail. Approximation i T adel 
moments of S—is clearly the best. but no mistake is likely in a test of significance whic? 
approximation is used. 
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7. Although it i 
gh it is possi Bin E 
€ those of the acquis pps similarity between the factorial binomial cumulant: 
Olour, the à n of 5 in a simple way wl aS 
; the approximations "pre ay when there are equal " 
one might ier ceni to the distribution of S are so good for n p re of each 
à ; t TUIC 
rem to be reasonable for the case when the numbers of argir 
r are 


Not the sa 
' same. We i 
e found that this was so. It is supposed there are 6 balls of one colo 


ur, 4 of 


another a 
nd 1 of £u ti I d 
each of two p i yroximation 1 and assume 
other colours. Thus if we consider appr ximati 5 


n=8 C=-> 5 
E. K Ent, 1-238, pis 
the binomial is 942 
: Gs rl. 
able3, (C 
2. compari. " array + . 
e parison of true distribution of S with three approximations (r = 12. [23]) 
S | T. 
L | | | 
| 2 | P | 2 | 3 4 | 6 | 6 | 7 8 
Ch" — — J | | | 
proximation I |o | | | | 
pproximation Ho ien 0-2670 | 0-3115- 0-2076 | 0-0865*| 0-0231 | 0-0038 | 0-0004 | 0-0000 
Trug Mation III | 0.12 D 0-2089 | 0:2987 0.1992 | 0-0885*| 0-0275*| 0-0061 | 0:0010 0-0001 
| 01205 | 0-2274 | 0:3042 | 02274 | 0-0952 | 0-0222 | 0-0029 | 0:0002 | — 
| 0-1118 | 0-2670 | 0-2966 0-2003 | 0-0903 | 0-0275*| 0-0057 | 0-0008 | 0:0001 
| | 


ations (r = 12. [641°]) 


Tab] 
e4. C . 
- Compar -— : 
pte of true distribution of S with three approxim 
a | p E | | | 
pe | 0 1 | € |] ® 4 | 5 | e hk & 8 
n | | | | | 
| 
PPro; | | nec m | 
Ap, ?Ximation T 0.2641 | 0-2567 | 01598 | 0-0621 | 0-0138 | 0:0013 
imation IT | 0.0083 | 0-0563 | 0-1654 | 02714 | 0:2055 0.1632 | 0-0564 | 0-0091 | 0-0002 
14 | paras | 0-1590 | 0.0562 | 0:0112 | 0:0073 
0-0009 


T TOXimat; 
tie nation TIT | 0-0126 | 0-0552 | 01599 | 027 
03673 | 0-1582 | 0:0567 | 0-0104 


0:0054 


Ap 
À Proximation I 
0-0100 | 0-0624 | 0:1698 
| 0-1688 haek 


Fo, 
r 
*Pproximation II 


(r—3 r—8 Y : 
" ) = ) atela i 


Ky = y [y 
payers + (Gon) 


ş 


9 
4) 


givi 
Ing the bir 


T 
a fracti or 

than it a valueof the index was used to 
Boe. ilare m assumed that S is normally 
i eing put together in each of the tw! 


€o] Consi fer : 
Our ering that 7 is small and that there 1$ 
for r> 12. wh 


op, © 
[: en 
lous ü esults would suggest that 

> the normal approximation (111) will be adequate 


ila 


homial 
(2% + S = (0-51299 


and variance 
ment is surpr’ 


an 3:5 
. The agree 
between the nu 
omposition o 
tests of significance. 


disparity 
atever the c 
for 


abilities. In the third approxi- 


79/44. 
isinglv 


mbers of each 
f the numbers 


174 Multiple runs 


j : Jer- 
8. An extension of the theory of runs can be used in what we might call a test eria E 
i tence of type. Assuming the r events of k possible types, the null hypothesis wi vh” 
A r events are a sample from a multinomial, the probability of the ith type of wh 
p; This is to say that under H, we assume 
iè 


rl e 
Dry Tosk) =| II pi, 
I or 
i-i . 
r! 
with each of the D 
II! 
i-i 


imple 
sequences being equally likely. The alternate hypothesis, H, might be that we have @ ee. 
Markoff chain in which the probability of getting an event of the ith type at any P e" 
drawing is greater if the immediately preceding drawing also is of the ith type and 
otherwise. We may write these transition probabilities 


Pu = Pi(1 +0), 
Pij = DiD(1-OW) (i +5), 
W = Xpi[(1 - Epi). 


" 1enc? 
Assuming 0 = 0 for the first drawing of the sample, the probability of any given seg" 
in which there are 7 groups is 


where 


k 
(1+0)r-7 (1 — Wy M gg. 
ie: 


If we now consider the conditi 


P jo? 
- : a yosib 
onal distribution under H, for a sequence of given com} 
of numbers of each type, this 


is equal to 
1—0W\T 
( 1+0 | “i 
where K is a factor of proportionality and depends only on 1, 


1—-0W 
Taq =? (say). 


< 7, and 


It follows that T is sufficient for 9 and that the likelihood ratio test for 0 = 0, % 
valently ¢ = 0, is a function of q 


ui 
ed " 
Thus the use of T is equivalent to the likelihoo 

test in this case. 


rat 


9. We have been interested in the 


m 
the simple stochastie problem of the 


i 
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the ki 

A eg strength of cement-mortar briquettes (Table 5). We mark the observations 
"nn Ts of magnitude, ties being decided by tossing a coin. The ranking is simply in order 
ES able the runs to be counted more easily. Each group will correspond to a different 

our. The number of runs is 22 and 
8235-22-23, é(S)-4, var(S)- S. 

Assumino S 

in nad S tobe normally distributed we see that significance is not achieved. a result which, 

18 case, is confirmed by the orthodox F-test. 


Table 5. Breaking strength of cement-mortar briquettes 


Group 1 | 2 | " | F | i 
nome: i | | | E 
NM | 
Tension in Ib. | sis (5) | 508 (3) | 538 (12) 535 (9) 492 (1) 
560 (20) 5743) | 544015) | 840 (14) 506 (2) 
| 538 (11) | 528 (6) | 554 (18) 550 (17) 528 (7) 
| s10 (4) | 534 (8) | 579 (24) 555 (19) 536 qo 
| 844 (0) | 538 (13) | 598 (25) | 567 (21) 572 (22) 
| 


“ i wi is wi be th e 
: Although th 7si i uns test has little power this will not he casi 
for gh the analysis of variance runs Doo E Uonaider 


applicati ing elegant illustrati 
the : ions. We owe the following e'eg ; : 
to falls in the price of shares on the London Stock Exchange during the period 6 November 


December 1956, both dates inclusive. Five types of industrial activity, A ees 
7 Weries and Distilleries, C Electrical Equipment and Radio, D Motor and Alreratt, 


il Were ; ; in The Times for eighteen businesses 
cho r y i ices as given in The T? 2 € 
P. iria ien coc ie n ed day, the type of industrial activity for 


t 
Which, ,,) Pe Were taken, and Table 6 shows, fo Ed 3 
Į 3 the &reatest number of the eighteen showed a fall in qe oe x Limes be 
* ew cases where there were equal numbers for two types, that typ! 
Te iacu: 
er rises in price was taken: 
1 : in price of shares 
Table 8. Type of industrial activity showing greatest number of falls in P f 


7 Nov. B 4Dec. C 
- Nov. 4 13 Nov. B 20 Nov. E ef! ee. E 5 Dec. C 
8 = A 14 Nov. C 21 Nov. E 29 Nov. A 6 Dec. ? 
Due 2 15 Nov. C 22 Nov. E 30 Nov. E 7Dee. C 
in S 16 Nov. C 28 Nov. 4 lDe. E 8 Dec. 
SV ud 17 Nov. E 24 Nov. 


- and S —-r—T - 9. 
3. hav, » . Be zd gg eb T= an 
To = 25 with r4 = 4, rg - "p^ ^ € 


9rmulae we find 


5 . — 2) = 582 
5 E e. > 
R= Xr - 1) = 19 al 
p 2H, 4106 
Elr—3) Fi Bere , 
dissi = x papel m9 
s) = 488 
804 o(S) = 1241 and ii 
hat g-4(8) £17 >3. 
ri mind 
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is picking out 
This normal deviate is significant, and we conclude therefore that the test is ia ra xn 
Pp that during the Middle East crisis there was some persistence from day to 
e 
the way in which different classes of shares were affected. 


er ill 
11. In nearly all statistical applications of the theory of runs the test of e 
be one-tailed. For example, in Wald & Wolfowitz's application of the two-co our R 
tion to the two-sample means test, a small number of runs could be held to spine bah , wd 
separation of the population means. The critical region in their case will be the ta ota 
T is small or S large. This is also true for the applications given above; for einer 
analysis of variance application a complete separation of the twenty-five € dioi 
into five groups would indicate the possibility of the five population means being T or the 
instead of equal under the null hypothesis. We choose therefore the lower tail of -— 
upper tail of S. The result when there is too great an alternation of colour and T is sigh! 
at the upper tail has not so obvious a statistical interpretation. 
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Table 1. Three-colour runs (S 2 r— T) 


(tH. probabilities are obtained by dividing the number tabled by the corresponding value of ) 
7;! 


| 
Values of T 
à "oom à r! ; 
: à Hr! i B E | 
3 4 5 6 i 8 9 10 ul 12 
P 
Ch» 7 
A 2 2 90 6 Is 36 30 
TE 4 60 6 Is — 36 10 
i if go yo 38 AE 
E s 
: 2 2 2310 6 3 62 30 38 
> 3 1} W| A x 40 18 
5 B d io 6 24 42 30 3 
|? 1 1 42) 6 234 12 = ES 
814 
> 8 2] sole 3» to wo m T4 
1 2 2] 420) 6 30 90 150 120 24 
Box d 280 | 6 30 — 80 90 n 14 | 
JE n 16s 6 30 — 60 60 2 = 
6 | 
|o d 56 | 6 30 30 — — = 
DEN 
: 3 3 | 1690 | 6 36 150 360 510 444 m 
* 3 2 | 00 | 6 36 10 310 405 — 288 7 
? E B 756 120 240 — 232 96 6 
4 2 | 756 6 36 2 " 24 
< 4 1 | eos 6 36 (120 180 180 8 a 
a rj jo se 10 18 132 56 
5 B ag 2352 | e 36 80 100 30 — — 
AE Sip es p = 5 > T 
lg (iz 7 
t : 5 z0 1234 838 248 
4 3 3 6 42 202 580 1050 x a is 
= 2 la 42 192 510 870 Aa Ae 
5 g 9 3 2 470 752 692 2 
5 : ag 42 19 352 108 18 
po? i o 4 162 90 3? 7^ ae 
2 2 P 59 350 440 2 
^ 6 42 152 i 20 " 
7 T 1 |o 4» ww 250 240 mo? z 
a 7 t I$ a w c 7 = L 2 
\y a 1 6 42 42 = — 
764 480 
8 4 3 [mso | 6 48 266 — 900 an ae m ies o 312 
B Rs i EE 2 25 2 a j 
| 5 3 3 | 9240 6 48 256 840 es 1968 1548 702 135 
9 4 s| apo | o sæ 20 79 15 iso so 39 1 | 
5 3 2 4620 6 48 226 660 122 E Fm En 30 
a 9 1 | 29772 | 6 48 216 48° i 560 300 90 5 
7 $ J | g3r0 | 6 a 20 450 Go — 480 90  — = 
7 2 2 | 1980 | 6 48 186 480 — soo 280 (6 — — 
8 3 1] 1320 | 6 48 176 | A S - ad c 
9 : 1 495 | 6 4s 126 210 I a ES = 
à 1 umole 48 96 — ES | 
| 4 " 3018 — 0894 9036 7938 2320 p | 
5 4 4 | 3465 & g Se 1900 © zoq4 7388 0982 2826 — 7 
4650 6 s 3300 597 100 
ü 3 aloes | & s 8X VI Sa wo mea Se 1100 — 100 
S 8 Biwn]|m s mu D 32. qua ammo EAD m 222 
6 5 2 3639 5 312 1080 cep 20 3550 3130 710 70 
4 16632 6 54 2388 36: n E: 
7 2 54 302 1030 2350 1820 060 
: 138600 | 6 5 2 EE 4 a 
E 54 972 880 1120 540 110 22 
6 5 2), 7920) 6 5 ? oo 120. 1400 Mb Ge 
7 H l 5544 | 6 54 272 os 1008 1080 660 7 A I 
8 $ 1 39600 | 6 54 259 en 1008 84 210 7 i a 
> 3 2 | 2970 | 6 58 NS uo w 9 M o — = L-— 
NUS 2 1 1980 | 6 54 = 280 1 — d L O 6 
| eo | 6 54 We 7 es = gi 
m l q iso | &g Bà 72 "e ipeum ——— 
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Table 2. Four-colour runs (S = r— T) 


Multiple runs 


5t 
probabilities are obtained by dividing the number tabled by the corresponding value of zu) 


Values of T 
: r! » i 
r Ty Te Ya Tå Tiri : = 
i " " 8 " $ 9 10 11 12 
p 
6/2 2 1 180 | 24 72 84 
$8 d 1 120 | 24 72 24 
7/2 2 2 1 630 | 94 108 252 246 
35 2 1 1 420 | 24 108 192 96 
à i I 2 210 | 24 108 72 6 
8 2 2 2 2 2520 24 144 504 984 864 
3 2 2 1 1680 | 24 144 444 684 384 
$ $ 1 a 1120 | 24 144 384 384 184 d 
48 1 840 | 24 144 324 294 54 
9|3 2 23 2 7560 | 24 180 780 2010 2880 1686 
8 3 9 1 5040 | 24 180 720 1560 1720 836 
4 2 2 1 3780 | 24 180 660 1320 1260 336 
2*5 Tr I 2520 | 94 180 600 870 660 186 | 
5 » 1 i 1512 | 24 180 480 600 216 12 
10 3 3 2 2 25200 24 216 1140 3720 7480 8416 4204 
3 3 3 1| 16800 | 24 216 1080 3120 5160 5016 2184 
* 2 2 2| 18900 | 24 216 1080 3330 6210 6006 1974 
4 3 2 1| 12600 | 24 216 1020 2730 4170 3306 1074 
5 2 95 |] 7560 | 24 216 900 2160 2736 1368 156 | 
|4 o6 3 3 6300 | 24 216 900 1740 1980 1116 324 | 
5 9i Y 1 5040 | 24 216 $40 1560 1536 768 96 
11 | 3 3 3 2| 92400 | 24 252 1584 6360 16680 27756 27408 12336 
4 3 2 2| 69300 | 24 252 1524 5820 14400 22056 18708 0516 
4 3 3 1, 46200 | 24 252 1464 5070 10720 14956 10848 3506 
5 2 2 2| 41580 | 24 252 1404 4950 11016 14184 8364 1380 
4 4 2 1| 34650 | 24 252 1404 4350 9000 10656 6768 2016 
| 9 3 2 1| 27720 | 24 252 1344 4200 7896 8484 4704 816 
5 4 1 1, 13860 | 24 952 1224 2910 4176 3384 1584 306 
6 2 2 1 13860 | 24 252 1164 3210 4920 3480 780 30 
& g 3 1 9240 | 24 252 1104 2460 2920 1980 480 ag zot 
41 
12 | 3 3 3 3 | 369600 | 24 288 2112 10176 33360 74016 109632 98688 yn 
|4 3 3 2 | 277200 | 24 288 2052 9486 29590 61916 83952 00038 1345) 
4 4 2 2 | 207900 | 24 288 1992 8796 26100 5m6  c2892 43128 — (4 | | 
5 3 2 2 | 166320 | 24 288 1932 8316 23856 44520 50494 30408 — 159, 
| 4 4 3 1, 138600 | 24 288 1932 — 7896 20580 35616 39312 25428 3% 
| 5 3 3 1, 110880 | 24 288 1872 7416 18616 30320 31104 17528 220 
|5 4 2 1| 83160 24 288 1812 6726 15966 23820 21384 10818 ^ HA 
,6 2 2 2 83160 | 24 288 1752 6876 17460 27120 92080 7020 ay 
|6 3 2 1, 5540 | 24 288 1692 5976 13060 17120 12780 4199 ^ 55 
5 5 1 1| 833204 | 24 288 1632 4656 8352 9024 330 2449 1 
6 4 1 1| 27720 | 94 288 1572 4386 7410 7629 4680 1590 | 
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BINOMIAL SAMPLING SCHEMES AND THE CONCEPT 
OF INFORMATION 


By D. V. LINDLEY 
Statistical Laboratory, University of Cambridge 
in order to obtain a prescribed 
discussed. The concept of 


it and the corresponding 
mpling 


elena Methods of sampling a binomial population 
foie d in the determination of the unknown proportion are 
concept ie due m Shannon is used, and the relationship between 
schem, of Fisher’s is investigated and used to explain certain features of the sa: 
es. 

ch each individual can be 
ation being 8. The problem 
n sufficient knowledge, in 
1956) we have proposed 
discuss in detail the 


E. eiit random sampling from a population in whi 
Eussa Aor not-A, the unknown proportion of A'sin the popul 
lens eit in how much sampling should be done in order to obtai 
à golutio se, about the value of 0. In a previous paper (Lindley, 
Bini. using the idea of information, and in the present paper we 
bis ion of this idea to binomial sampling. 
a prob Supposed that at any stage of the sampling the knowledge of 0 can be expressed by 
ability distribution p(0). If this is so the quantity 


I = 190) = | ptYtog ptt)? a) 


Provi l l 
Mia à convenient measure of the amount of information about 0; indeed, accepting à 
hag "ia ild requirements on the properties that ‘amount of information’ should possess, it 
(Th, een shown by Shannon that Jis, apart from an arbitrary multiplying constant, unique. 
i 5 Constant is incorporated in the arbitrariness of the base of the logarithm.) Details 
found in the paper already cited. The effect of sampling will be to alter p(9), according 
hag ayes’s theorem, and so to change Jp. The rule we propose is: continue sampling until Ip 
rea , À 
o ^e d some prescribed value. 
àcilitate the calculations it will 
Pal) -c1- 
D fe b positive. This family of densities has th 
hu : hé then the posterior distribution, after a singl à 
ermit )orp,, v41(0) according as the result of the trial was A or not-A respectin 
a ar ap presentation of the sampling schemes on ? diagram with axes givin 
" : 
5. Simple calculations show that 


T(p,,(0)) = In [T(a +b TO F0))] 
+(a-1)[¥@- 


hep, 
e 
Natural logarithms have been used in (1) and 
F(x) = din T'(x)/de. 


be supposed that p(0) belongs to the family 

y T (a 4-5) 60 T(b)). (3) 
ty that if the prior distribution 
al has been performed, is 
tively. This fact 
g the values 


With « 
q 


e proper 
e binomial tri 


W(a 4-0)] +0- » vo) - v (a 2)]. (3) 


g continuing until the 


involves the samplin 
i s ribed value. We defer 


App; 
Va; Plicatio "— 
lue n of the sampling rule propose iba ined» pres 


5 of a and b obtained are such that (3 E 
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; - is not 

consideration of the details of this for a moment in order to note that the integral (1) P) 
^ Y se | ee =@0 
invariant under a change of description of the parameter values. Specifically if 9 és à 
is a monotone function of 0 then the probability distribution, q(9). of Ó will be given by 


4 
q(ġ)do = p(0)d0, (4) 
and I, = 1(q(9)) = COLTO 
i d 
= | p(0) log (ptt) A d 
10 8) 
= h+ [ntn log ap. 


, : : E nt 
It follows that ifa sampling scheme is adopted in which we try to obtain a prescribed amou 


à E Ve 
of information about d, it will, in general, be different from a scheme relevant to 0. V 
consider three schemes depending on the choice of $. 


1 sion, as it 
3. Information about 0. The information about 0 is given by (3). This expression, as 


i i : Muy MEETS IL 
Stands, is too complicated to be easily understood but considerable simplification is poss! 
by use of the standard asymptotic formulae 


In T (z) ~ In J(27) 2 x+ (r— l)nz, 
Stirling's formula, and 
F(x) ~ Inz— 1/(2a). 


We obtain I(pay(0)) ~ 1n f(a +)3/(2z7rab)} — 1, 


tic 
for large values of both a and b. Tt is, however, worth noting that both the asympto 


we 
formulae are remarkably accurate, and that therefore when we refer to large a and b 
often only mean that they are both greater than 5. It follows that the boundary in the (o 
diagram is a curve of the form 


(6) 
(a+b)? = Aab, 


where A is a constant dependent on the amount of information required. Thus if the m 
distribution is such that the point (a,b) lies to the ‘south-west’ of the curve in the | F n 
quadrant, sampling is continued until the curve is crossed. The general features of e 
boundary (drawn roughly in Fig. 1) and its form for small a and b have been discuss" 
previously (Lindley, 1956). The main point of interest is the manner in which the boun a 


Ed t H u ^ 
approaches the origin when either a or b is small; the approach is faster th m 


" an even (6) ; 
Suggest. This agrees w 


; : inually 
ith the ‘common-sense’ feeling that if the same thing contin uld 
happens, say the sun rises each morning, then we are much better informed than we V? 

be if there was known to be even a single non-occurrence, 


4. Information about à = 2aresin 40. Here dD lag = 


sgy the 
f dm {00 —69, and on evaluating 
final term in (5) and adding it to J, given by (3) we obta 


in 
1) 
I, = ln [T (a - b) (T(a) F(5))] + (a — 3) LT (a) — V (a 4- b)]a- (b — 3) [V (b) — (a b)l- P 


By use of the asymptotic formulae it ean be shown that for large a and b 


8) 
T, ln {(a+b)/(27)}— 3, 


Je 
and hence the boundary of the region required is a+b = constant, that is a fixed t 


S "ov 
size scheme. When, however, either a or b are small the asymptotic result does not P? 
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good approximation. Cor e situation when @ = 1. is large the asymptotic 
onsider th ituation wh a 1. T£5 la: get tot 
ymp 


expressions yield 
ahh 3 
instead of (8). (y is Euler’ I, A iy-1 (9) 
Roth ue A AREE csi ae to T(1).) Ifit has heon decided to use a prescribed 
critical amount of n i p = € "i plais on o E di 
i en a = 1 will be attained by a value of b satisfying 


n 
inb-iy-1-2ih5--$* 
on ? 


on mak 
equating (8) and (9). Hence 
(10) 


b = n[et*v[(27)]. 
about 0-77. In other words if a = 1 only about three- 
observation being not-A) is necessary to obtain 
en a is large. Thus, for example, 


T 
M aer in square brackets is 
the same bue: sampl inpe IN 
V we äre a of information about 2arcsin 40 as wh 
About J, a a x le quse about 0 we should presumably take a = b = 1: or if ignorant 
requisite ndi. it makes little difference which. In either case we should obtain the 
if A’s and ee about ó pg quickly if every trial resulted in not-A than we would 
acquiring hers » were mixed. This is the same phenomenon that was observed when 
ation about 0 but here it is not nearly so striking. The constant size boun- 


ary i 
Sa P s 
adequate for values of a (or symm 5. Thus when a = 9 the 


Constant etrically of b) greater than 5 
Shown į — corresponding to that in (10) is 0-96. The general form of the boundary is 
in Fig. 2. 


n of a constant amount of in- 
for more information about 
can often be exaggerated 
of values of 0 (0 «0 € 1) 


oF 

: K à i H H 

ing mation about yy = In (0/(Y —6)}. The consideratio 
When E s" 2 arc sin y0 instead of 0 is equivalent to asking 

it is near 0 or 1 than when it is near 3. Such a tendency 


Still f, 
urther wi a 
Teplac re with profit. Suppose we imagine the continuous range 
ed by a finite set of values so chosen thatit would be important to distinguish between 


Wi A 

enon etbouring values of 0 but finer distinctions would be unimportant. Then if A0 

near o. Es minimal distinction it will often happen that A6|0 would be constant for 0 
Subs : lat is the percentage accuracy is fixed. Over the whole range of 6, AOJ(0(. -0) 

P © constant, a form which reduces to for small 0 and whichis symmetrical 


in 4 
and ; ner i 4 
not-4. Let us find a transform, that the same distinctions mM y are 


Wire, 
d o a s 
ver its entire range; then 


the previous 
V. of 0, such 


— a 
Or, in Diani Ay = al —y 
ing to the limit, 
TI y = In (of - 9 (11) 
h 
Present transformation arises naturally in 
Mittin udy is not clear and may be accidenta 
XPoner S à single minimal sufficient statistic for à 
tial family, that is with likelihood of the form 


though its connexion with the 
1 that the only distributions 
are those belonging to the 


another way, 
|. It is well know! 


1l sample sizes 


Ho) Gua e" 

on of the sampl 
aily with y=ln 
| binomial samp 


e values x, and i is the 
ioj(1—0)). Tt is 


Wher 
a functi 
le scheme the 


eui 

] is Y" _ -— 

Para the minimal sufficient statistic, aes 
to this fan 


am, 
Wi eter, T i i one : 
9 - The binomial distribution belot es 
peti Bene 


h 1gs 
noti ; á M i 
ng that for any number of repetitions © 
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z Dni ficient 
tatistics Xa; and Xb; are jointly sufficient. In most cases they will also be minimal ine Ts 
à: : Eb, : Sos us 

à tin ezceptional cases this will not happen. Thus in the fixed sample:size pores spei 
Hallan, inverse binomial scheme (see below) £b; is alone minimal sufficient. This is beca 


i ; g i re not 
Xa; can, in those two special cases, be deduced from the sizes of the schemes (which are 
statistica) and, in the former case, Xb;. 


b 


(4) an) ? 


Fig. 1 Fig. 2 


(1,1) 3 
Fig. 3 


Figs. 1-3. Sampling schemes for providing a prescribed 


amount of information 
about the parameters indicated. Fi 


or the notation see text. 
Considering, then, 


ye 
Schemes which produce constant information for y. We i 
d0/dir = 011-6) and 


3) 

on evaluating the final term in (5) and adding it to J, given by ( 
we obtain 12) 

I, = In[T(a.4- 6)/{T(a) T (5))] 4- a[* (a) "P (a 4- b)] + b[V (b) — F(a +b)]. ) 

The asymptotic results used before yield i ) 
3 

I, Vin [abra] à 

and hence the boundary of the region required is 
ab — N à (14) 
a+b ^» constant, 


N: 
a rectangular hyperbola with axes a = N,b=N: only the portion in the quadrant in 
52 N will be meaningful in the present situation. This obtains provided a and b are 
greater than about 5. Now, from (12), we have 


el, [0a = at" (a) — (a+ b) Y" (a 4. b), 
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Which is T 7 
8 positive since (popa. 
dx [x1 (x)]<0. Hence the information about y always inc 
reases 


When an additional i 
that this di aM observation is incorporated; with the two previ 
with Vesp € happen for small a and b. Thus d um x T 
approximation e ion does not obtain here. From this it follows that (14) is a ace 
information is suffi ue boundary provided N z 5, that is provided the required as A 
seem of su fficient d large. The general form is shown in Fig. 3. The case N « 5 does not 
Ei diane, ds petii to warrant a numerical investigation. 
Would often hay puis it is appropriate to consider constancy of information about y, we 
convenient to = oe ior knowledge to suggest that Ó is small, and it would probably be 
(2) with a sae pr ess this prior knowledge of 0 by a distribution belonging to the Foi 
might OTE ie b large. If it then happened that a< N and b» N the boundary (14) 
Seriously cin y be replaced by the asymptote to the hyperbola, namely a = N, without 
ing the approximation. This modified sampling scheme is that advocated 


Y Hald 

y Haldane (1945) in jus . 

With pro oe 45) in Juss the circumstances we have been considering. He was concerned 
g a sampling scheme which would yield an approximately constant coefficient 

at for small @ this is equivalent to 


Of vazi 
Mati ‘ p 
on for the estimate of 0. It is easy to see th 


requiring the stand: te 
andard deviation of y to be constant when 7 is large. That this will be so 
erations in the next section. 


in large 
b a he ni immediately from the consid. 
More closely res o consider the transformation T = In 0, constant information about which 
“and b but — the situation considered by Haldane. This causes trouble for small 
, he three e MES values the boundary is again of the form a = constant. 
ond besean ^ ning schemes described here are quite different in character, as may most 
the fo y a glance at the three figures. The first and third represent the extreme cases: 
ample to ty 0, whatever its value, fixed accuracy, for ex- 
onstant. The second, 


i wo decim: ? E 
decimal places; in the latter the proportional accuracy 1S € 
1eme, represents anintermediate requirement. 


Blvin à à 
e a A ^ a nach the fixed sample-size scl 

Needed. Te; which to use depends essentially on the type of knowledge about 0 that is 
Wolves is Interesting that no loss function is involved in the approach, yet this choice 
cid ing ie Ne which are similar in form to those which would be advanced in 
idea o cà a loss function. The reason is that Shannon's integral (1) expresses uniquely 
information, the only choice left, so to speak, is the dô in the integral. On the 
e based on the integral (1), as has 


er] 
b land 
xS the loss function approach is distinct from on 
1956). The reader can easily construct other 


nd 
8 emo " p 
Che e nstrated previously (Lindley, 
d given here. 


S for 
special si : s 
pecial situations, using the metho! 
ling rules intr 


ner ae 
one is concerned to know toa 


ced here have been con- 


odu 
mes that have 


6, Th, 
Structag omen of information. The samp 
Previ using the concept of information due to Shannon. Similar schen t 
he sampling variance of an efficient estimator 
found to sufficient accuracy. 


has been 
tion used has been Fisher's. 

2L(x | 9. . 
E a PE- (15) 


ou 

a sl 

^t Y been suggested have utilized t 
yhether 0 I 


e cri 
riteri 

Sen ially the whereby we can judge V 

, then, the concept of informa 


(L(x | 0) denotes 


ce referred to. 
ation taken 


denotes the expect 


g varian 
different in character. 


dé, 
nare totally 


is Š 
S 
th y Mptoti . 
“lo ically equal to the inverse of the samplin, 


Wi Sarit) 
tres of the likelihood of the sample point ? an 
to a.) Now these two concepts of jnformatio 
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There are two main differences. First, Shannon’s concept involves the pr e 
prior distribution for the unknown parameter whereas Fisher’s does not. nerve e alis 
non's concept, when applied to construct a sampling scheme, uses the likeli e ual 
sample point x obtained (in Bayes's theorem), whereas Fisher's concept uses elo 
probability distribution of z for fixed 0; that is it involves not merely the x that was poa 
but also the z's that might have been obtained. The first point is to the advantage o i soit 
notion, but once the idea of a prior probability for 0 has been admitted the second T 
makes Shannon's concept simpler to apply. . 5 are 
Now despite these great differences we notice that the resulting sampling ae 
closely similar. Thus it is well known that for samples of a fixed size the amount of Fish a 
information about ¢ is constant. In other words we can acquire a prescribed amoun! 
Fisherian information about 9 using a fixed sample-size 


Ś t, 
scheme. But we have just seen tha 
at least when a and b are not too small, the 


same result obtains with Shannon's Jy yid 
Again the Haldane scheme appears in just the same circumstances as it did originally W 
44 was used. We now offer an explanation of why this is so. 

Bayes’s theorem says that 


P(O | x) = p | 0) p(O)Ip(), 
and hence, on taking logarithms, 


L(0|2) = L(x|0)-- L(9) — L(x). 
Differentiation twice with respect to 0 gives 


-?L0|r) — cLG|9) cu) (16) 
90 — $ g GE 

Suppose that PO | x) is approximately a normal distribution, as it is in our study Len er 

a and b are both large, then 7(p(0 | x) is easily verified to be approximately — ln pe 

where o? is the posterior variance of 0, given x, and will, of course, depend on v. Its 


On cA the 
be distinguished from the variance of an efficient estimator of 0. But, again, in view vei 
approximate normality, ¢-2 — —O*L(0 | x)/262, and hence if we choose a sampling S2", 


s , le 
or equivalently a set of points x, such that I(p(0 | x)) is constant it will follow that the 
hand side of (16) 


ovided 


2 
1 


he 


is constant. For such à sampling scheme if we take expectations over 
sample space we shall have, from (16), 
_&L(0 | x) -6(_°L(«|9)) aL) 
es ag ^ 902" 

The second term on the right-hand side is negligible in large samples and hence 

" 1) 

e?L(0 | x) (Ii 

EE L^ 
ib 

where -%, denotes Fisher's informatio 


E gum! 
n which is therefore constant. The same arg" ied 
persists with d for 0. This demonstrates the equivalence of the 


app 

to schemes yielding constant information, 
This result establishes the asymptotic equiv: 
variance of an asymptotically efficient estim. 
the schemes using this latter v 


two concepts when 


ó 
the 
E ; and 
rior variance of  ? stru! 
"i 


, ? ms 

fore possible to cc jlacin? 
"^ om 

fficient estimate Tel 


alence of the poste 
ator of 0. It is there 
ariance with the asymptotically e 
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the unknown v ù 
e y of 0. Thus the variance of an asymptotically efficient estimator of 0 
—0)|(a +b); replacing 0 by a/(a+6) yiel 3 ich i i 

i 2) acing Y yields ab/(a+6)°, which if hel 

n the first of our schemes. i l icta 
n the an Js 7 j ; 

is cdi other hand in small samples .% is a difficult concept to handle. Primarily this 

ies ise the whole sample space enters into its evaluation: in the binomial situation it is 
ssary to e 1 ; j 

dene x to conduct a tedious enumeration of sample paths. But even when this has been 

a solution of the equation % = constant presents considerable difficulties. The 

eas ian who refuses to recognize and use prior distributions may still find our method 
acti s . " " a h 

Main Sisi eim of this virtue of approximating to another approach. We conclude the 

y discussing the roles of prior distributions and Shannon's concept in the practical 


Vpplicati P 
Pplication of the schemes. 


obtrusive in the present study. 


% Je fir. d H H 
We first note that the prior distributions are not very 
but only the point in the (a,b) 


gi 
oid ts oren the boundary of the sampling scheme, only 
lib qms m which sampling should start. In all practical applications the present author 
air idea of the value of 0 and this knowledge can usually 
he family of distributions given by (2). For 
10 % and 20 % then a and b can be found 
Certain, g ) (If fairly certein is interpreted as 95 % 
nyled chen calculations suggest @ = 30, Á ») Even if there is considerable prior 
"Wha t E uS in this case, experimenters often wish to pe able to answer the question: 
Which q wi this experiment to say about the value of 0?' They wish to make a statement 
Where Pine not depend on prior knowledge. Thus Haldane s scheme was devised for cases 
is known to be small, yet the standard deviation 1$ used to summarize the results 
; iderations enter into the design 


and tj 
le prior k 5 à 
ut, Prior knowledge ignored. In other words, prior cons 
above question can be answered satisfactorily 
to provide an answer 


Within gt analysis. Tis is doubtful if the abov' A 
Ut their sia consistent framework. Fisherian me " : apr cuts Mad s 
55:1; Ogical basis is not clear and it is possible to proc uce contradi auldon, 
n “Indley, 1957). It would be possible to provide an answer if a meaning vet be A 
ili phrase ‘ignorant about 0’. Although numerous attempts have - eti to do this, 
tr, de the ideas of invariance, none have been conspicuously Sr M a Poa > E 
er iege appear (Stein, 1955). Possibly ignorance about 9 is as far stan - he pas = 
d Y about 0 is into the future. In the three situations discussed in the present paper it 


Oes à ? 

ny hot seem to ds interpret the phrase ‘ignorant about the parameter’ to 

ean o unreasonable to interp : -— r f val 
bution over its permissible range 0t Y alues. 


th: see 
Su at the para: - has a uniform distri ) 
Lucr sender the following prior distributions: for 0, 


h an ; i 
as ! interpretation is equivalent to using : 1 scr a 
p — 0. (The last is an improper distribution of the 

Zum en between the two extreme 


t = li for ares 
„a = b= f for py, a= ; 

y i- 1l be noted that ev : 

servations: for observations 


e ex t the experimenter has a f 
Xpress 3 ui 
que. e approximately by a member of tl 
E ple i T We ecd ` = " 
ple if he says 0 will fairly certainly lie between 


Such ¢ n 
hat p(0) is only effective over this range. 
b = 175. 


Ype co, 
Cage, Onsidered by Jeffreys (1948).) Tt wi À ie 
a aq and y/) there is only a difference of effectively ae o o e A An 
natty 101-4 would suffice to alter the pri» distribution from or tical problems. Con- 
qu alent of two observations is hardly likely to be serious in prac E a a k 
Som ently i may | is ‘ions ü : experimenter's requirement y é hosing andb 
e ay be possible to meet t^ riko prior knowledge he really possesses suggests 

ri 


Ww 
ot lere ay ‘ 
leni. around these small values eve! 


hot 
the a, her pr 


on of the constant representing 


„minati 

debt 5 .— senlves the determ iln i» cin 3 

actical consideration 10V olves th paene of N in (14). This is easily done 
a 


ount of: 5 j le 
s. ire | example 
of information required. for exam] 
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in the present context. For situations likely to be met in practice the relationship betwee? 
the information J, and the posterior variance of 0, 0°(9), 


D = — $n 2re0°%(0)} pl 


'amili Tor example: 
can be used to evaluate the constant in terms of a?(0), a more familiar concept. For examp 
if yr is being considered, we have from (13) and (18) 


in {27e(a + b)/(ab)} = 41n (2zeo?(J/) 


s e 
and hence o°(yr) = (a+b)/(ab) = 1/N. In other situations a more delicate study of th 
posterior distribution involving more than the variance may be necessary. 
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A STATISTICAL PARADOX 


By D. V. LINDLEY 
Statistical Laboratory, University of Cambridge 


An exa; 
xample i " r r 3 
ple is produced to show that, if H is a simple hypothesis and x the result of an 


experi 
ment > is 
Ü ati on following two phenomena can occur simultaneously: 
nica ^s r Erster t f 
(ii) md I test for H reveals that « is significant at, say, the 5% level; 
k osterior probability : ; 5 ; A «us 
as high as 95 0/ probability of H, given v, 38, for quite small prior probabilities of H, 
Cle id 
ar 
enon bes the common-sense interpret 
iv oie OR 
Se a irly general with significance tests and c 
We b evel in some circumstances. 
egin by civi x me 
gin by giving the mathematical derivation of the example and later comment on 
mple from a normal 


it and 
s th : ^ 
stributic assumptions involved. Let (tr Va .Q,2,) be 8 random sa 
on of mean 9 and known variance c*. Let the prior probability that 0 = Oo, the 


Value 
ond 
istribut e null hypothesis, be c. Suppose that the remainder of the prior probability is 
Where = ed uniformly over some interval J contain We shall deal with situations 
Well E à the arithmetic mean of the observations, ] sufficient statistic, is 
n be sa the interval Z. The posterior probability ht of the sample, 
aluated; it is M 
No —n(s— 02] 209 K. (1) 


T -a[ exp[—n(@—4}'120°N148, 
r 


integral can be evaluated 


ations of (i) and (ii) are in direct conflict. The phenom- 
asts doubts on the meaning of a signi- 


ing 0o 
and a minima 
that 0 = 9p, in the lig 


č = cexp[ 


Where E 
K = cexp[—n(%—- 0,9(20?)] + 


bY B 
UV aS’. 
‘Yes’s theorem. In virtue of the assumption about zand I the 


So [tc 
V(27/n), 
ng the usual significance test 
the result is significant at 
a number dependent on & 
nction. Inserting this value 


y that 0 = 0, 


for he Suppose that the value of is such that, on performit 
s % Mean 0, of a normal distribution with known variance, 
y e ias point. That is, X = ETWAS where Az : 
tin [e àn be found from tables of the normal distribution unet 
) we have the following value for the posterior probabilit 
z= cetie Hit 0 — oye Gn) 
ases so that z will lie well with 


hat as n— 99: $1. It follows 


(2) 
in the interval I for 
that whatever the 


(No 
te th 
at = 
a beni. € — 0, tends to zero as n incre’ 
alue o a large n.) From (2) we see tl 
, a value n can be found, dependent oP c and æ such that 


Ù muss 
(ii) is significantly different from y at the a % level: 
is (100 — 2) %: 


t Thi, poss probability that @ = bo 
te beliey, he paradox. The usual interpretation of the first r wd 
x zu lon. 9+ 6); and of the second, that there is good reason to Å » 3 
NIS E. are in direct conflict, and the conflict may apparent y 3 
ith g = that the (100—a) 96 confidence and fiducial eT isa ie 
si," opm 5 we are 95 % confident that O+ 0o but have 95 % bee aste B otc ath 
i ifie a meriting on this analysis, let us first consider the assump jons 1n 3 . is 
nce tests involve iom ail in which the test criterion 15 asymptotically normally 


at there is good reason 
0 = 0. The two inter- 
ade even stronger by 
just exclude 0 = %- 


esult is th. 
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distributed with known variance, as is z in the example, and therefore the sample jana 
sidered is in no way unusual. The only assumption that will be questioned is the € vill 
of a prior distribution of any type, and, in particular, of the form chosen. A par pium 
only have been generated if we can show there exist situations where (4) a prior dis : un 
of this form is reasonable, and (b) a significance test of the ‘tail-area’ type is commonly i of 
Let us first consider the assignment of any prior probability. The argument for the ae 
prior probabilities has been put forward very cogently by Jeffreys (1948). His greet 
have, to my mind, been reinforced by those of Ramsey (1931) and, more especially, Savi s 
(1954). Savage's main contribution is as follows: he lays down certain axioms that @ ere 
should follow if he is to act in a ‘rational’ way, and defines a rational man to be a man i " 
acts according to these axioms. The latter are quite mild in their form and would e 
be agreed to by most statisticians. Savage then shows that a rational man must act fh p 
he had a prior probability distribution and (if relevant) a utility function. It does not foll e à 
from this that any statistical inference need make overt mention of a prior distribution, 
but it does follow that no inference procedure should grossly contradict the existence a 
a prior distribution. (A mild contradiction may be allowable in the interests of simplicity 
Another way of looking at this result is to say thata probability distribution is a satisfaci 
measure of one’s convictions about several hypotheses. For example, if to-day we say t an 
our prior belief in one hypothesis is 3$ it will mean the same as saying to-morrow that a 
prior belief in a different hypothesis is }; just as a yard of material to-day measures the d " 
as à yard of material to-morrow. If we are to use a significance level in a similar mm ei 
Fisher (1956, p. 43) has suggested we can, and most statisticians do, we must establis? 
similar comparison property. 5% to-day must mean the same as 59/, to-morrow- 
example, we claim, shows that it need not. 


; dor 
So much for the general question of introducing a prior distribution. We now consid? 
the partieular form used in deriving the paradox. We first note that the phenomeno f 
persist with almost any prior probability distribution that had a concentration on the ? 
value and no concentrations elsewhere. For ex 


) 
s 5 = Oy 2! 
ample, if there is an amount c at 0 = "o 


p 
according to a density D(0), where | p(0)d0 = i 
I ment 


n wou 


sont 
applied to the integral corresponding to that in (1), that z still tends to 1. It is SU d 
that p(0) does not tend to infinity too rapidly as 0 tends to Oy. Tt is, however, essentia : ain 
the concentration on the null value exists, and it is this that has to be considered. ^£. jf 
Jeffreys (1948) has diseussed the point. Briefly, : 


one argument is that the singling aie 
ce that the value 0, is in some WAY = 5 
: e to give two examples where this ê vee” 
unquestionably correct. The first is in genetics where 0 is the linkage parameter bev 
two genetic factors. If there is no linkage 0 = 4 = 1, and we üre con mamas with deve o] ig 
a test to determine if there is any evidence for linkage. Now in this situation thet? ply 
considerable amount of prior knowledge. For it is P 
if, the two genes lie on the same chromosome. Consequently if there are n chromosome 
approximately equal length, and if it seems reasonable to Suppose that the gene is ed" M" 
likely to be anywhere along the chromosomes l 


B upP 
f m engths, then it seems reasonable to $ Pol 
a prior probability of the order of (n — 1)/n that the value of 7 is 1. The particular num” 5) 
value of the prior probability is not so i t ; em ccn re aar LO 
value of the prior probability is not so Important here (though we note it is rath 


* € F 
known that, there is linkage if, am of 
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asis the fact that 9 = l isinasi TT 

cn x = jisan a singular position and will arise for most positi 

“ecg mple arises in the telepathy experiments ca rried out by ere oan Das) 
otherwise à pre paums are present. the experiment has a success ratio of 0 3 i 
centration of iis. sime: test for telepathy therefore should assign to 0, = la en 
Powers, This dcin l à y equal to one’s prior belief that the subject has not got telepathic 
Prejudices en dn is perhaps not as convincing as the genetical one because of the 
examples is bei à connexion with extra-sensory perception. My point in both these 
Near to f), it M 2 value 0, is fundamentally disent from any value of 0+ 0,, however 
common) "Á te: E ES Unquestionably there exist situations (perhaps they are the more 
Validity. of Li lich this is not so: where we are interested in testing the approximate 
` . ie null hypothesis, such as that the treatment has no (or very little) effect. 


We now paa ; Hodges * Lehmann (1954). : 

avage’s argu isider the paradox in these situations where the prior probability exists (by 

Sion of it in : i ment) and has a concentration on the null value. We first note that the expres- 

to be inten, : ms of fiducial or confidence limits used above is unjustified. The limits purport 

Initially Van E about the value of 0 in the light of the experimental result when 

tibutio * : ling is known about or independent of knowledge of 0. The type of prior dis- 
used here (suggested by the practical circumstances of the problem) certainly 


es not, 
Ment correspond to ignorance about 0. Thus we should not be surprised at the disagree- 
i; ning that the confidence or fiducial type of 


ances where one is truly ignorant 
ot so in the telepathy or genetical 


his poi 
bint has : 
point has been diseussed by 


p 

The p; 

Statement paradox merely serves as à war 
should only be used in those circumst 


Pout t] 
he par: es 
e parameter. We have argued that this 1s n 


exa 
1r 
: ples, 
The 
Neore 
eve] 


and statements based on Bayes's 
s in which the significance 
its interpretation as a 


nsa between statements of a significance level 
ains. Now in our example we have taken situation 
e, we wish to see whether 
Il hypothesis does mean the same in different 
all right, by the arguments above; and since 
fixed significance level, in an extreme case 
ance level, the degree of conviction is 
gnificance levels. 5% in 


1S fix 
ligas n^ because, as explained abov 
eiii af conviction about the nu 
* Now see Hs The Bayesian probability 15 
"oducin iat it varies strikingly with 7. for d 
Rot eve g a result in direct conflict with the signific 
Ns lute approximately the same ir ns with equal si 
n dh ii sample does not mean the same ows — 9 "m 
Sterior native interpretation of the paradox W ard. 
Probability c, given by (2), may be written 
Whe a= qf Itc t E 
7 EN e P5, 
Ja a d Ino’) 
mple. Clearly fn ^ 9? as m 00. A, fixed. Hence 
the null hypothesis increases indefinitely ib 
strate, without reference to prior probabilities, 
ance tests depend on the disjunction: either 
is false (Fisher. 1956, p. 39). For the chance 
ce of the observed event and other more extreme 
the likeliho: 


ured by od function. These two 

measurec 9. aM 

^ aradox arises because the significance level 
ado3 


€ likeli 
E iri: of 0, on the evidence of the sa 
the i, AT level the likelihood of 
a > Unsou, Fn This appears to me to demor 
ty are chang ness of the suggestion t 
oo ered i — occurred or the null hy 
ik e dh. à significance test is the chan 
Anees p hance of the observed event is 
chave quite differently- In fact. the par 
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: i n the 
argument is based on the area under a curve and the Bayesian — m 
ordinate of the curve. However, the above interpretation through the lik: sam i 
no mention of alternative hypotheses which seem basic to any approach to th ners the 
The other approach to significance testing, due to Neyman & Pearson, does e» ^ i nil 
use of alternative hypotheses and hence appears to give a reason for using t d in 
because this region is the best one in which to reject the null hypothesis at a speci RR 
of significance. Therefore the occurrence of an observation in the region is an wa ded 
on the null hypothesis and less unusual on some alternative hypotheses. But the oui 
does not justify the practice of keeping the significance level fixed, nor does it take a vau 
of the fact that when the observation has been made we know, not that the enge dy 
in the region of significance, but that it has fallen exactly on the edge, and the like 
under the null and alternative hypotheses seem the relevant quantities to compare. The 
The paradox is not, in essentials, new, although few statisticians are SAVEURS of x ular, 
difference between the two approaches has been noted before by Jeffreys (see, in partic E 
1948, Appendix), who is the originator of significance tests based on Bayes's Tum d 
& concentration of prior probability on the null value. But J effreys is concerned toemp s oe 
the similarity between his tests and those due to Fisher and the discrepancies ut 
emphasized. The same phenomenon was noticed by Lindley (1953) in decision theory "a: 5 
and some computations by Prof. Pearson in the discussion to that paper iir riot 
how the significance level would have to change with the sample size, if the losses am I the- 
probabilities were kept fixed. (The discussion based only on the latter quantities i$ puit 
matically equivalent to one in decision theory language with zero-one losses.) The Perit 
note considers the situation where the significance level is fixed and the variation in pos 
probability is evaluated, rather tha. oblems 
The concept of a significance level has been used very successfully in practical pro 


á bee 
of inference. One might now ask how this has come about. The answer has yen P (2) 
given by Jeffreys in the appendix already cited. Essentially it is because c, as given PY. 
tends to unity very slowly and, for modera. 


ribe 
te values of n, € may be less than c at a preso 

significance level and the two concepts be in reasonable agreement. Let 

A = ce-¥4/(1~c) /(2n), Q) 

€ — A[(A - e| An), 

and ¢+0 as c|4"— coo. Hence in a small ex 
strong reasons to doubt the null hypothesis. 
wetakec = $ and use a two-sided test at 5 96 
and the table gives the value of v for differ 


see that for small samples (n < 10) the prob. 
initial value of 4, giving c 


n the other way round. 


then 


e very 
e 

po? 

0584 


m! 


diu 
null hypothesis. For m je 


ew 
ittle, so that although Y^ 5 


„oD! 
significance, has not altered our belief in the null hypothesis at all, To reach the 5 d 
contrast put forward in the paradox it would be necessary to take » about 10,00 pes? 
course if c is smaller then smaller samples will suffice, For example, if we apply. m to 
numerical values to the Soal & Bateman problem (i.e. use the normal approximate the 


the binomial) we have o? = i.$ = 0-16, anda sample of size about 48 has ¢ equ? 
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with a run of forty-eight trials which is 


origi 
ginal value of 1. An experiment of this type 
lepathy if initially we had an open mind 


Significa 5 
Sathe nile 5% would not alter our views on te 
ë s E 
m. The normal approximation 1s not adequate for samples as low as 10. but it 


is clear : 
ment gh ey sec ens ones would increase our prior belief at all noticeably. An experi- 
me P En ` w pnia raise our belief that telepathy did not exist to 95%; quite a 
ES in ead s m Ta the 37,100 trials carried out with Mrs Stewart. The reader 
Po Vin the light of ed j hat with ¢ = 3 the posterior probability of the null hypothesis 
10, The ovi xperiments with Mrs Stewart (9410 successes) is of the order of 
evidence for Mrs Stewart’s telepathic powers is rather strong. 


t z " ; 
: 0:008 600 0-589 

3 0:076 800 0-623 

: 0-092 1,000 0:849 

E 0-105 2,000 0-728 
in 0-116 4,000 0-787 
20 0-156 6,000 0-819 
40 0-207 8,000 0-839 
s 0-270 10,000 0-854 
60 0-312 20,000 0-892 
a 0-343 40,000 0.921 
209 0-369 60,000 0-935 
200 0-453 80,000 0-943 
300 0-503 100,000 0-949 
[re] 1-000 


400 0-539 
at it does provide some 


nly the evidence provided by 
bly misleading) summary of 
A similar assessment is 

In the situation con- 


An 
— advantage of the significance level statement is th 
9 experim ment of the truth of the null hypothesis using 0 
at the ex ina It is, in effect, a convenient (though possioy 
Ailable į perimental result has to say about the null hypothesis. 
in a Bayesian analysis through the likelihood function. 


ide, 
ed her 
ere the function is proportional to 


nificance level, 


Tegar 
Sa ‘ns : as a function of 0. This, unlike the single number expressing the sig 
à Dose and is therefore more difficult to understand. A reduction toa numerical value 
Xample e provided the assessment of prior probabilities conditional on 0+ 0, is made. For 
if is uniform in the interval J in these circumstances, then 

nl — 90)? 


[20%] d0 = I) ap|- e, 


ust be multiplied in order 


So 


ay, 


exp [—n(e— 0y)2/207] [| exp[- nī- Oy 
I 


the 

to quantit; " " fo m 
o y by wł ri j -+¢),infavout 0-7 i 
bi y which the prior AE ; s- ne” logarithm, might be an accept- 
3 al to Jeffreys’s K, since he 


ain 
PPoseg : ini for the significance lev 


el. It is nU 
ature of tests based on Bayes's theorem. 
til he has reached à result 


g randomly uni osul 
i at some prescribed signi- 
ch that $ = 8,4- Ac] n 


le 

§ par; $ 

ml Pose oe dox serves to explain one puzzling fe 

fig Ich ig z experimenter has continued samplin 

Nes j, "PIRE & fixed-sample size significance test. 
dp 


e 
e u 
n æ. That is, he has taken & sample (* i) S 
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It is easy to show, by the law of the iterated logarithm, that this will happen wit h pine 
one whatever the value of 0. Then the experimenter has, of course, cheated if he ge "e 
result as being significant at x 95, though. if the distribution theory were known. 2 ead 
significance test could be made. But it would not be that appropriate to a sample o K ) 
sizen. On the other hand, it is easy to see that the likelihood of the observations Q EE " P 
does not depend on the particular sequential stopping rule used and is, therefore, n d 
the likelihood the experimenter would have obtained if the same sample had been goas iH 
by taking a sample of fixed size n. It follows that any significance test based on Bay a 
theorem does not depend on the sequential stopping rule used, at least amongst a wide €? " 
of such rules. In the extreme case the experimenter can go on sampling until he has pee 
the significance level z, and yet the fact that he did so is irrelevant to a Bayesian. a 
telepathy this is known as ‘optional stopping’: stopping when the results look ima 
striking, that is, on a significance level criterion. The explanation is now clear. If 00o t if 
optional stopper will reach his desired point for small x and & «c. On the other m 
0 = 0, the value of n will be larger and c » c. (These are average results, of course, natural 
sometimes mistakes will be made.) The value 


e z the two 
of ¢ is just what one would expect in the ind 
: : « RM : S 3 pire 
cases and we see that the Bayesian will not on the average be in error in ignoring the stop! thy 
^ 4 path’) 
able assessment of those results in telepí 


rule. It should now be possible to give a reli 
€ had objections raised against them on the grounds of optional stopping. 


which hav 


st 
I am much indebted to Profs. Pearson and Barnard for helpful comments on the fit 
draft of this paper. 
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STOCHASTIC CROSS-INFECTION BETWEEN TWO 
OTHERWISE ISOLATED GROUPS 


By H. W. HASKEY 
South-east Essex Technical College* 


l. Bailey (195 
Spread ofa p. € and Haskey (1954) have considered, from a stochastic standpoint, the 
Was introduced " ection among a single group of people into which one infectious individual 
Wo groups " : he group being isolated. The following is a generalization: There are now 
h Infectives Se sopla; one with n, individuals and the other with n,. At time t=0 we have 
^i and y. nu h into the first group. At time £ we suppose there are, respectively, 
ist erou 5 Tr susceptibles remaining. Let the chance of one of the susceptibles in the 
A ing infected in time dt by an infective in the same group be ar (nı - &h —7,) dt; 
US lien tha sam of » id dt. 'The total chance of a fresh infec- 
| nares tiie e gd nese quantities. Similarly, for the susceptibles in the second group, 
an, ^r. (n M infection from infectives of the same or of the other group be brs(2; — rg) dt 
^ — n) dt. Writing p(ry, ra) for the probability of ry, r susceptibles at time ¢ and 


Cong: 
"sidering the i 
possible changes, 


by anj 
an infective i = 
ive in the second group be b'7;(%2—"2) 


(ry +1, T3) (ny. Ta) (rurat 1)> (rp rj), (ry rg) > (fy ra) 


n tl 
1e ensuing time dt i | | 
t, we have, events in the groups being regarded as independent, 


P(r, ry) 
dt = — p(rg ra) [ar (n4 4h — 74) + brs(ta — r9) + a'ra(ny th — 14) rna 7 79] 
"E prs. vs + 1) [b(ra + 1) Q2 772 — 1) -a' (ra 1) (a +h—14)] 
V the Lap] er 1,5) [alr + 1) (my — 1) Pi D (7n (1) 
9 have ace transform of p(r,, 7) with respect to time is qu (ry 79), OF qn ra) for brevity, 


= 1) + a(n +h—1)] q(r» To T 1) 


—1) +b" (us — 7] dry + 172) (2) 


?,T9)] = (ra + 1) [D(me—"2 
4 (ry +1) [a( ^7 


Whe 
Te 
his hola: . ds, Tg) = (ari a's) (n4 h- n) gi (bra u Pr) P - di 
Sifts <n, 1 < mi. Likewise 
ies Me) [A - f(r, na)] = ary 1) (m + h-rı— Lalit ln) («m sn 
nara) [A $f (my79)] = (ro D [ble 727 1) +a'h] g(r ra+1) (r2 « ns); (5) 
(6) 


ected, the quantities 


Un 
3 moo D) Dorf ny] = L af 
ally unit 


Mi >O fin, 


m din the group origin 


2) it follows that 
+1)+k(ri+ 1,79) mGQy b To); (1) 


= a mean number uninfecte 
E ao. 
sims, PDA] = Sty rat D m(n e 
(4). (9) and ( 

=r) tain hon] (8) 


) 0 (s — rg)]- 
(Industrial Group H.Q.) 


are 4 ^ 
quations corresponding to 6), where 


jira) = (r9 — 1) [Om 
k(ry ra) = r,[a(ny th", 


1 
3 * N is 
Now at U.K. Atomic Energy Authority - 
jom 44 
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, 2, 

To see the nature of the solution we take », —2, n,—3 and work backwards from m( 
to m(0, 1), giving 

m(2,3) = 3/[[f(2,3)--A], m(1,3) = 3k(2, 3)/[f(2. 3) - AJL/(1, 3) +A], 

m(0, 3) = 3k(2, 3) k(1, 3)/[f(2, 3) - AJLf(1, 3) + AJ [P(O, 3) +A], " 

m(2, 2) = 3j. 3)/[f(2, 3) +A] [/(2. 2) +A], 

m1, 2) = 3k(2,3)5(1, B/F; 3) -A]L/Q. 3) + AJL/Q., 2) - A] 

+ 3)(2, 3) k(2, 2)/[f(2, 3) + A]L/(2, 2) +A] [fC 2) +A] ete. 


Any m(r,, r,) can be written from the table below: 


Stl " P 


= 


$ 
coved route 
by selecting all ‘approved routes’ from the cell (2, 3) to the cell (74,7). An 'approv ed 


ch 
: : ing to €? 
starts from (2, 3), runs along rows to the right and down columns. Corresponding 


x „odut 
approved route to (71,7) there is a term of m(7,172)[f(2,3)+A]/3 consisting of a pro^ 
fractions of the types 


J(813 82)/[f (81s 55 — 1) + A] = 0(,, 89), say, 


for motion to the right and 


k(s,, sS) [f(5,— ls)-A]- $(8;, 85), Say, 


for motion down a column, from the cell (Sis 55). 
There are as many terms to be added to 


je 
d pro" 
give m(r,, rs) as there are different ws o find 
routes from the cell (2, 3) to the cell (74,7) 


- The double sum XXNYm(nr$j. require? gon 
Ty Ta bu 
the mean number of uninfected in the group originally uninfected, is the sum of eir ch e 
from all approved routes to every cell. These contributions can be arranged in Bate, 
being associated with one approved route to the cell (0, 1). Hence go 
L XMm(r, ry) = 3 (2, 3)/[f(2, 3)«A], 
1 Ta 


where (j,k) is defined inductively by o? 
PGE) = 106, E) VG, k- 1) - 93,8) yj —1, 0) ol 
and (0, 1) = (1, 0) 


n 
= 1. Equation (10) can be expanded and the double su™ i 
products like (9). 
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When m = 
Firar 172, = 3, h 2 3 and a = 0'8, b = 0-9, a’ = 0-2, b' = 0-3 
1, 7) at each cell is inserted within it, we have a e ee 


Tg 5 
2 
ri = 1 

p = 

? 6:6 82 8-0 
ʻi 

| 1 5-6 6-8 6-2 

0 3-0 38 2-8 

—ÀÀÀ 


Rach 

term o 

3 Za, nll ne double sum has then to be resolved into partial fractions, becoming 
n “rurall f(y.) +A], and the different terms collected; for example, as3/ (6-6 +A), 


computed. Corresponding to these we get inverse 


q, 
2/(8-2 
+ 
E A), and the coefficients Uri 
um gives the mean susceptibles M at time f. 


Apla, 
Ce tr 
len q terii Ay, 6-99. etc., and their s 
le eae is the number of ‘cases’ at time t. 
i z . és 
8 a relatively simple case—though the com 


putations are lengthy—since there 


alues, terms involving (+A) 


-1 
> 


Te n 
9 two y. 
: a r : 
x FA)-2 (k oon of f(r;, rs) alike. When there are repeated v. 
did T : ] ] š 
Volveq F 4)73, ete., appear in the double sum and the calculation of their coefficients is 
or the numerical case cited above we obtain 
Term in mean 3 
number infected Term of ‘cases 
J(2, 3) — 9-8791e-8®t — 65:2020® 
f, 3) — 4:9934e—5®! — 2796367“ 
JO, 3) 99-692367-3 — 299-07 7e-?*! 
J(2, 2) 0678702”! 5-5050" 
£, 2) 3-545 7e-9*'! 94.]111e79*! 
J(0, 2) 14-34916-29! 54526073! 
f(2, 1) — 0824268 — 6-593e8% 
fü, 1) 10-8647 67-3610-9* 
(0, 1) 88-9412e-2*' 249-035e-**! 
d the sum of the terms 


Ae g 
ln um of 

fhe the terms in the first column at t= 0 is 3, as it p de 
ives cases as 10! ows: 


Secon 
d column for various values of t gi 
t 
. «6 1:0 
9 0:1 0:3 0:5 0 
: Er 1:76 2-21 267 254 2-37 1-44 
HW 
4 | c d then decays to zero as 
" t a case curve which rises steadily to à maximum an 
Xm(ry 2) in the general case. 


ergeometric series except that 


] without factors. 

ed here i8 likely to be approximately true 

for thinking that it applies only to small 
13-2 


mq 
et = been unable to obtain a general formula for 
Whe rs ami involves sums of terms like those in à hyp 
Be lle UN terms are quadratic in "1, 7 and in genera 
Nera] Mathematical model of infection us 
' "Ier are good epidemiological grounds 
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groups for which 74, ng do not exceed five, and that instead of two eo Eee 
there are many. Even for two groups of four the full expression for fYmo m] ^ dapes 
tracted notation of equation (10) is a yard long. Leaving aside the question of t - ribs 
infecting groups—which would involve a three-dimensional group of cells — ire 
two-dimensional group in $2—there is a special case for which a general formua a 
obtained for the mean susceptibles in either group for any values of n,, ng. This men tad 

a, b, a’, b' are equal. We imagine constant fractions of the infectives and the mS im 
susceptibles in each group travelling to the other for a given infinitesimal interv® ^ abe 

then return to their own group for another different given infinitesimal interval and fo ‘be ( 
this by another visit to the other group and so on. This gives (1) but depends on ini 
doubtful procedure of splitting infinitesimal time intervals, but if it is granted, E 
special ease corresponds to the state of affairs in which only the susceptibles. and “nly | 
them, travel to and fro, spending half their time in each group, or alternatively when be | 
the infectives, and all of them, travel, spending equal times in each group. Choosing 

unit of time to make a = 1, we consider this special case. 


e y) 
For simplicity taking 4 = 1, and n = n, + 1 = n, f(r}, ra) factorizesinto (r; + s) 22 7^: 
and the table for f(r,, 72) is: 


`" E n n—1 2 1 
Jae t] | 


| 


n—l (2n — 1).1 


(2n—2).2 (n.4- 1).(n — 1) Tot 
n-2 | (2n-2).2 (2n—3).3 es (n—1).(* D 
! 
9 n.n (n—1). (n^ 1) 2. (2n —2) 1.(2n- D 
— N 
This shows that there are many repeated entries but that any approved route per 


. : . " BE 
beginning of the epidemic, (n — 1, n), to its end, (0, 1), passes through each of the d 
possible values of f(r}, r.) twice and twice only. 


pa 
s . ev 
We now write z(2n —2) = Q(x) = x, say, with the understanding that whenever "d 
clarendon symbols are connected by signs the complete set of them forms the i: 
for Q. Thus X —y = O(x — y), not Q(x) — O(y). It then follows that 


frr) = Yir, 


2n-s—-mc-1--2)0(n—m,n 8) = (n—s— 1) (s -- m), 


Qn-s-m-142)9(n—m,n—s) = (n — m) (s 4- m), 
and XXm(rrj 1 


n 
~ 2n— ral *3n-3 1A U (0 - ln—1)-y(mn-— 2,9): 
and by repeated use of 


Wry. 79) = 1-04, ra) Vr(ry rg — 1) Plita) (ry — L r2); 
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we find that 


n | , Qn-2).1 


EXm(r,.5)- 384—141. *2u—2-À 
{2n—3}.2 2.(2n — 3) T ce JJ - 
das is MEE T -IA |e (12) 


Since x = 2n—x. all such terms as 4-- A occur twice in denominators, except n+A, which 
Occurs once only. The coefficients of (r +A)- and (r+ A)-* in the expansion of (12) in partial 
fractions must be found, and we can, without loss of generality, consider all values of r <n 


and then the isolated value r=n. l 
All the terms containing (r-4-A)7? and not merely (r4-A)-! are included in Z, where 


T, = A,B, and 
4,= n(2n — 2)! (2n —r — 1)! - 
"T GNU 1 FA) FA) FFI FA)... (0-123) (0:2) 


'"- 1) (2n-r | 3) (2n—r4 2.(9n—3)[, , 1. (9n — 2) 
bi 2 ze t x Ditt 24À [+ IFA ]J- 
(13) 


r-14A 


&üd's , : . 
din partial fractions T. is 


2 —142A9, (14 
Cal(L A) + coe Leg NEEN 65] (0:2) ANEA edm 18A. 06) 
a 15) 
m dy, = Tir +A)?| Cy = ax te tA etc. ( 
Noting that n” = n(n—1)...(n—r+1), the series B, becomes 
2 (» — 1)2 (r — 2n)? 16) 
Re (r= In), (= P= N= 2M) 4 to y terms. 
raf Meroe n) jt DE-A 


Tis 
found 
that $r 19i [r21,2, ...,(n- U]] 


Ora oe -(—y-32n(n—r7) (2n. — 2)/[r— 1)! Qn — I 
2 (y 2) (p — Qn) Ea 
P €: : PR UM E C sg — +... to r terms 
TEE (D 3!(2r—2n— 
1! (2r—2n— 1) 
-Fü-r, 2n —r;2n—2r4li 1) E 
x > (— y (2n — 2r)! (r— (m - r= D: — pa 
" d = 4n(n 1) (2n — 2)!/(r— DAP l 
fu rr P | 
"they €, — |B, 2 (A Qe ryyp:k 4,0 m) zm 
9 rr TOÀ r 
p à (20) 
A *ryr—11!22) 242). (071* ) — 
ape Erate M : 
os 1/(A+k)}- 
a2 € ioi) 
ji rk) t 0 Km 
Seti = = n(2n—2)! (2n- r- 1)! (z 1/( 
Minds it is found that 
a | 
lo- remo ar [a e *? sa 1 "$ ox (21) 
r-1 mA gid P: P . 
ET (yami mm z=1 
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Tf r=1, the first two sums of reciprocals in (21) are omitted. 3 
Since the evaluation of 0B,[0A by differentiation of each term of B, above leads to € 
awkward expression, the general term of B,, viz. 
(—F (r-1y?(r-2ny9|(r -142)(r-242)..(r-s14A)(r-x42) (ro! ui 
may be put into partial fractions as follows 
Nm "n e ue me o — (23) 
sea (7 7 (s— 1)! (w—8)! (2r -1—2n — sf? (r — s - x) 
On differentiating this with respect to A and putting A = — r, there results 
z — 2(—)*-8(r— 1) (r — 2n)™ (r —n — s) z (24) 


= Mg(ms) say. 
s2138! (x — 8)! (2r — 2n — s + 19 (2r — 2n — 8)? P s) y 


—1 z 95 
io ica Bl 2 7€ $2. en 
aa A=-r z—18-1 
h r-1r-1 
This double sum may be expressed as X; Y; g(z, s), or as 
s=lz=s 


"a (—y2(r-n-— s) ((—)8 (7-1) (r — 2nyo (= 8 (r — 1)6+ (r — 25 en 

s2188! (2r —2n — s? | 0! (2r 2n — s— 19 1! (27 — 2n — s — 1)6+D * 

(-y9(r-1y-»(r— my) 
(r-1-s) r-2n-s- D 

g > 2(r—n—s) (r—1) (r—2n)® — [v-r-1-s(—)r (y — 2n — s) (r — 1 — sy 

PEE ETE endi ji y! (2r—2n — 1 — 28) 
E- 3(r—n—8) (r 19 (r — 2n)9 F(s-- 1 —7, 2n —r-e5; 2n 25 — 2r 4-1; 1) 
ss! (2r — 2n — s? (2r — 2n — s — 1)® 


Fast 


s=1 

* 2 gs! 
= 2 2(r—n—s) (r — 1)! (r—2n)/(2r — 2n — 2s — 1-1-9 (2r — 2n — s — 1) (2r — 2n — 5) 
Since (2r 2n - s - 1ye-0 = (2r - 2m — s — 1) (2r — 2n — 2s — 1ye-1-9, 
the above becomes 


r—1 
“= th 2 (r-n—s) (r— 2n)9|(2r— 2n — s — 1)e-» (2r — 2n — s)? ss!. 
Further, as 


(2r 2n — 1) (r— 2ny9 = (2r 2n — 1) = (2r — 2m — 1) (2r -2n—5— p^ 
we have 


9B, 


= -—]1yYr-b-—T - " TY » 2): 
(27 —2n — 1) "o 2(r- 1)! X 8r-2n— 167» (rn —s)/s(2r— 2n — 5) 8! (r e? 
= oB 
EE DA ha 5720- Da- 3). 


Expressing 2(r—n—8)/s(2r—2n—s) as 1s+1](p+s), 


where p = 2n —2r, we have 
OB, 


r-1 2a 
E or * X fs e V8 (- y-*(—1) (r—2) sed (s+1)/(p+s) (p+s+ 1) saë qnt 


I iryo (2n - t 
Mo (en n—2+A)...(2n—r+A)(2n—r—1)! 
Grae inrer (14 otan- f , C220 1-9] Jl -— 
U r+1ta ; n 
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If this is w T 
ritti eee : . 
there results en out as two series, putting the terms in the order for which s = r— 1,7—2, ..., 
= l 1 
z —1) 2 Tie 
(p+r— l - (r | (-060-2 
p+r—1)[r-1 @ 2D (par A 6-3)»*r-2(»*7-9) to (r—1) terms 
D sieh (210-2 
+— = 2 
ptr-l +r- (ptr-3P (p&7-3) to (r—1) terms| 


—p-r+2; La) - jd e (424 (per- 2]. 


EX s . "i 
(p--r— Lyr jn [(aPtr-2 4-733) a-r, T 
(27) 


Where t] 
ne suffix li —7 
ffix 1 in the hypergeometric indicates that its last term is to be omitted. Ifr=2 


he in: 
teeral ; 
Fam is to be omitted and if r — 1, 0B,/0A — 0. 
Un where erms in EXm(n rs) containing an unrepeate 


d factor (r +A) are included in 


n (2n — 2)! (r — 1)! 


(+1) (2n—7—2) 


nci =: 
iin: Qn-r-242) ri24A 
and th 
e ; 
" coefficient of (r 4- A)-! in this is «(À+ r) i-e which simplifies to 
Pe 
A(n — 2)! (2n— 25) F(r, r- 1— 2n; B+ _ on; [nr -D = 1, 2, «+037) 
(29) 


Gat} 
lerir 
Susceptiki the terms and taking the inverse Laplace transforms, the mean number of 
$ es in the group originally uninfected at time f is 
ro | (~ yr 
| F- n(2n.— 2)! (2n — 2r)! P(r, 12^; 2r4.1—2n; D/[(m 7 DIT 
- ~ PME 2n—2r+1 
2n(n—r) a-a- X "ma P à a ri) finr ect 
2-1 g-2n-2r*l 1 
* (— Y 2n(n —r) (2n — 2)! 
[r= 1)!f? (2n — 2r — 1)! (£n —r— 1) 


—r; —g3)- ides (r= re 


x b 

l B aaa RR 1;2-P 

" iilis a (2n — 2) tJ(r — 1)! (9n 77 — | exp[—"(22— 7)t]. (30) 

With Which 
th (19 


n be obtained. In computations it is better to work 


a the mean number of cases car 
nd use (30) as a check. 

eptibles in the group: numbering 7, 
T9) = rt Ta) 


ig i, Le method of finding tl 
il o ng the mea quanti went 


ürt à 
9 that in $2, for equation (1) 

"(Ng —rg)) 

e of origin of the epidemic. 


e 
ciate to the P^ to 1. 


Are 
q 

The “fined, Een i — 
Dr e primes indicating that t um to 0n 1 o, from n 


9e 
e 
u i = "| 
re is as before, but now "2 runs 
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1 7i 
If np =3, ny =2, h=1, a=}, b=3, a’ =}, b'=4}, we have 


Table of f(r,, 72) Table of j'(r,, 72) Table of Bry ro) a 
T> 3 2 1 0 TQ— 3 2 1 0 T> 3 A ji 3 
$ 9 2 11 2 23 
St 7 8$ ë n 2 1 3 3 0 n 2 H a 
$ 3 8S 4 * Ti? 34 0 /— L| 1 0 o 


Hence 


XD Nr, ro) 


à = ette 
ndn: = 20,3)/[/(2,3)-- A] = 201 +6(2, 3) (2, 2) + Ø(2, 3) Y(1, B)I/Lf(2, 3) +A] = ° 
giving 
2 1 33 ae 2 
EXn(r, ra) Fol tg [itg ei tira tea 


=a n-À 
21 3r 3 1 2 3 4 3j J: 
2 1 1+ 
tape tanl seat apa et) Waa TAA 


T - in the 
The coefficients of 1/(74+A), 1/(74+A)? in the above are those of e~7 and te 
expression M for the mean number of susceptibles, which is 


M = 35:2079e! — 54-09796738 4 17.914363 4. 3756-8 "T 
+27-6571e-# 1-517969 + (6-40591 — 29-9493) 7l, 


yielding the following table: 


Cases-—dM/dt 3 294 252 208 146 QT MM 62 
ù x 0? o18 030 042 000 090 180 jc 
iden? 
The cases show a, steady decline with elapse of time. For the group to which the hat y” 
originally spreads when n=3, m=2, h=1,a=3, b=2, a’ = t b' =4, the cases are 8!” 
formula similar to (31) yielding the result: 


Cases 1 — 2399 2.523 2476 1-916 — 0-469 Md e) 
t 0 03 0-54 0-6 0-9 1-2 3:0 
g and 
It is seen that the cases in thi 
then decline slowly to zero, 


When the cases in both 


205 
8 group rise slowly to a maximum at about time = 9 
jjal 
aA P 
* a i e 
rising to a maximum and declining rapidly is ing? 
about time = 0:24, at which time the cases fot” ^ jas 


total cases in two cross-infecting g 
the same total size. 


5. The individual probabilities D(ri.r)) can be caleulated by the foregoing metho all 
isillustrated for the special case, taking n, = 3 = + 1 and considering the group ins 
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uni " 
nfected. YXq(r,r,) now includes the quantity $(1,0), since © p(7,,0) is found from 
r 


" 
G2, 0) + q(1, 0) - q(0, 0). It is found that 


X p(r, 0) = 1 (20 — 48t) e+ (7 — 84t) e79 — 37e, 

’ (a) 

Y p(r,, 1) = (— 37 + 484) e+ (1200 — 20) e + 57e, (b) 

S p(ry 2) = 6679 4- (15 — 361) e% — 21e, (c) 
(d) 


Y p(r, 3) = 2e39 — 2678 4+ e. 
Ifthe 
S - “Pegs i i iti 
> ne probab ilit es are plotted against time on the same axes it is seen that (a) rises steadily 
to ma: o to unity while (d) decreases steadily from unity to zero. gen rani ian 
" xima of about 0-3, 0-32 respectively, attained at times 0:5, 0-24 respectively, and 
^ decrease to zero. 
tany ti i 
. any time t the graph which has the greatest or 
sceptibles, C, and it is found that 
pe 0-0-32 0:32-0:37 0:37-0:45 0-45 onwards 
Pprox. prob. at 0-3 0-278 0-295 
transition times 


Ohsemoaie 
Servational data would be obtained in this form. 


dinate shows the most probable number 


6. F p s 
lie A convenient expression for the total mean number of susceptibles in the two groups 
also be obtained when fry r9) factorizes. By Way of illustration we now write 


Ale) = 
) = a(n, +n, +h — x) as x, then 
nh nn ra) = Tit Fe 


Th frs r9) = (i Ta) (03. 
© Laplace transform of the total mean susceptibles in the groups 1s 
LLU (1,72) = EE(r +r) 4o 79); 


and writi 
writing N = n, +n +h, 
+r- 1) (N -r= t) 


("1+ ra) J’ (ro 1) = r(?s 
y =f -1)U -n-'9 (34) 
there results (ry rg) E (ry 79) = nri "2 X 172 
RATS A 35 
Th [f(ry. r9) - A] ers, r9) = lr rs Dj (75 DT 1,7) E (ro 15%) (35) 
e 
table for f(r, 7,) is 
[ und 4 vite h ee Re —2) 
M : c ) (4-1) (n E D (h+ "à 2 
m] (h+1) Sant (a+ 2) (27 ) 
"=g (h+ 2) (ny +22) se 
| " n left are the same. The 


o bottor 


by the usual method, 


in 
Whi ' 
So] t all entries in diagonals running from toP right t 
n of the difference equation (35) gives: 
4ng) = Ves a) 
: fig — 1) + H(t M2) ym — 1,3 
a= "(nz na)lLf (i — b n3) +A]. 


[A+ flor, na] EDU ry r9)/ (s 

Wh = 1 4 O(m4 ng) YO 

Ere a : i a] (n 
Nr Ng) = j" (ny ns) [Lf "2 De 
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Inserting the values of Y (n4, n4 — 1), Y(n — 1, na) and of their successors there results 


T4 T 0. i l y oe a 
Eilers) = IR + one bU a E EPR 


1 
x (a -a- 1) (N 2h —2) h(h - 1) Aa umm 2S a 
continued until the term 1--A appears in the denominator. If ny=n,=n and h= l, 
remembering that now x = (2n + 1 — x), we find that 


1 (2n — 1)! 1! Qn—-1)(p-1! — — 
Elige [rra Dam" T tanp AFT) 0 2) p) 


(2n — 1)? |: (36) 


"TUO 0-2)... n) 
If the differential-difference equations for a single self-infecting community of 2n persons 


and one infectious person are written down (Bailey’s equation (5) with n replaced by 2n) 
and P(r) replaces rp(r), we have 


dP(r) 


+ 


a= r(2n —r) P(r +1)—7(2n4+1—r) P(r), 
Pen = —2nP(2n) 


Taking the Laplace transforms and noting 


r-—r(2n--1—7) =2n+1-r, 


(r +A) Q(r) = r(2n —7) Q(r+1) (37) 
(2n-- 2) Q(2n) = 2n } (r = 0,1, ..., (2n — 1)), 
whence 
MEE (2n — 1)! (2n.— 1)! 2! . (38) 
2 m 
zen "Ir auicm GT net | 


H H i Ds 
From (36) and (38) it is seen that Xl(r, ry) is identical with X Q(r). Hence the solutio 
x 1 T n ge 
obtained by the present writer for 2Q(r) in a self-infecting group, ean be used for this F 
of cross-infection. Hence the total mean susceptibles in the two cross-infecting group? 
2 2n! (2n-- 1—2y) 


2n—r 9) 
A e-e! Biens 1—2r)— = z+ (2n4-1— 20 exp[—7(2n4-1—7)0]- e 


7. Ifn in (39) is large, computations to find the number of cases at any time are sengt 
and even if n is small the general formula for cases, resembling (31), is not informativ? 
the nature of the graphs M, dM /dt against t. In this section it i 
approximation to the total case curve for the two cross 


is proposed to devel? A 
From (36) -infecting groups, considere 


n 
(40) 
Q(r) = 2n! (2n —r) (1 +r) (14 fr UA)... 1--2n[A)]H[(y — 1)! A27, 
and the Laplace transform of the total mean number of susceptibles is b Q(r) or 
2n[1 + 2n[A]71/A + 2n(2n — 1) [(1 *2n[A) (1-- 2n — 1/A)]3]12 ° 


41) 
+ 2n(2n — 1) (2n — 2) 2! [(1 + 2n/a) (1 *Qn- 1/0 + Qn 2]2)] [9 7 ( 
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Since À is er 

Deam M rre than any of the r, the binomials may be expanded and the coefficients of 

"iii ; in them are the sums of homogeneous products of appropriate dimensions of 
re of the quantities 2n, 2n—1, 2n —2, etc. For the first few powers of A we have 


2n 
Y Ol») A= à 
2 Q(r) = 2n[A-1 — A-2 — 2(n — 1) A3 — 4(n?— 4n 4-2) A7 + 8( 1 + lln?— 19n 4-7) A7? 
— 16(n1— 26n? + 107n? —123n+38)+...]. (42) 
As A7" ha i 
à s an inverse Laplace transform with respect to time of £^-!/(n — 1)! 
assitiblon at tinet fn p [o [(n — 1)!, the mean 
2n[1— 2 
[1—4— (n — 1) — 2(n? — 4n + 2) 8/3 + t( — 19 + Ln? — 19-4 7)/3 
_ 9f5(n4— 26n? 3.215 
and mean — 2t5(n. 6n3--10722— 12382 +38)/15+...], (43) 
2 — AB(— n9 + 1122— 19n 4- 7)/3 
4 2t4(n! — 26n8 + 1072 — 123n--38)/3...]. (48) 
(equations (15)-(23)), but only slowly 
of the exponentials, e^, with their 


2n[1 + 
[1 2(n — 1) t4- 2(02 — 4n 2) 


This į : 
Tb — than the method given by Bailey (« 
Bia and a practical alternative to the expansion 
It h e coefficients. y . S 
E Mos been found that the case curve cannot be approximated to satisfactorily by one 
near} n two curves of the type (a+bt)e™ of which it is composed. Since it appears to be 
» = a truncated normal curve, the first three terms of (44) were supposed to commence 
Xpansion of kexp[—a(t—})?] giving a= 4n—2, b= (n.— 1)/(4n — 2) which can then 


e Writt en 


2, 


2 
? exp {(n — 1)2/(4n — 2» exp (— (4n — 2 — (n— D/(4n.— 2) 
2t (2n — 1) (n? — 11m 16)/3+...}, (45) 


Wh x (1—48(n—3) (2n — 18-7? 
9 H 
Moe the series, producing skewness, must be continued to many terms. 
Es r & given numerical value of n a reasonably good approximation to the case curve can 
tained by the method of Karl Pearson. When n = 2. 3 the mean cases are from (39), 
E 
(—23-+ 36t) e+ (24+ 18t)e 


an, 
: -12 4. (27008 — 645) et + (900t — 489) gm, 


s 
““Pectively (1140 + 7201) e 


ut the f-axis are — 0-3918 and 
the t-axis from these points to 


for which these curves © 
e respective areas and finally 


si 
: igo o E negative values of t rri 
Infini 0 respectively. The areas under the curves and od 
th "hs re easily found, the cases at time t are is eal meet 
1n is shi e A e à 
156-081) e^, 


y(T) = (—146:914- 142-59 T)€ 


tersection O axis. 
ation of the a 
a7 4 (146-91 + 


(|, =1. 
0 


Which s 
is ]i "M 
S like a probability density function 1n that we 


The 

first four moments of y(T) about T —0 and the mean 

- 1-0679, Jn = 1-5880 
= 01597. 


pa = 08370, Hs 
20.0809, /4 


anq 
m 0-1996, /s 


2 
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These yield f, = 1878 = 0-823, fo = Hale = 401, 


which, by Biometrika Table 43, indicates the Pearson Type I curve as the best fit, although 
Type III might fit better for higher values of n. For n=2 the cases at time ¢ are 


const. x (1 4-£/0-3918):9! (1 —t/D,), | b, 5:88. 
This gives the mode at time 0-16, rather earlier than the correct value 0-21, obtained by 
solving y’(t) = 0. 

Using T in the sense above and using a Type III curve it is found that 


y(T) = eT T2193 for n=2, y(T)-e-599T T?55 for n=3. 
It is surmised in general that y(T) 2-e-?^7 T?*», where p is positive. An improved fit could 


be obtained by calculation of exact case values, using (39), for selected times and applying 
maximum likelihood to calculate the constants. 
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SOME STATISTICS ASSOCIATED WITH THE RANDOM 
DISORIENTATION OF CUBES 


By J. K. MACKENZIE ax» M. J. THOMSON 
Scientific Industrial Research 


Division of Tribophysics, Commonwealth 
ustralia 


Organization, U niversity of Melbourne, Aa 
o estimate the frequency functions of various angles in 


are made of the frequency function for the angle of 
d to rotate a crystal into the same orientation as 


revo eal A Monte Carlo method is used t 
andom aggregate of cubic crystals. Estimates 


disorientation, i.o. tho least angle of rotation require 
neighbouring crystal, and for the angles Min 4100», Min <110>, Min 412» Min 123» and 


2 [<110>, (1125, 41235], where Min ¢100> is defined as the least of the nine acute angles between 
(100) directions in neighbouring crystals and similar definitions apply for the other angles. 


1. INTRODUCTION 


Thi g ' 
his paper is concerned with the estimation, by means of random sampling, of some prob- 


mra distributions which arise from a class of problems in three-dimensional geometrical 
to oa ility. The results obtained are of interest to metallurgists in particular and perhaps 
m ystallographers in general. They are presented here in the hope that some statistician 

ay be sufficiently interested to try to obtain the exact distributions in a more or less 


explicit form. 
to eie stating the specific problems it is desira 
D the standard crystallographic notation 
to an A can be specified by the components t! ? 
bnc. rthonormal basis and the symbol [ww], in 
edges ion. Thus, [100] is the direction of the x-axis. 
cubic aiaa to the base vectors (axes) is m pe pt 
1p consisti 2 yer rotations and 2 ich a 
4j ations rier re ate or reflexion. Starting with a given direction [uvw], 
[) ipod equivalent directions (24 lines in all) can be derived by the use of the ici coe 
5 erations and are called variants of [wow]. These are simply derived from the given - 
on by permuting the indices u, v, W in sign and in order in all possible ways. his se 5 
equivalent directions is denoted by (uv), in carets, and not all the 48 imi u 
jo tinct, e.g. the set (100 consists of the set of 6 directions [100], ree rs A A : I 
ti [001], the bar over an index being used instead of a minus m we ie m alor 
(hy the components of a normal to a particular plane, this plane 1s ret c prm ene 
8 kl), in brackets, While the set of all planes equivalent to the plane (Akel) is ure re 
A Way and denoted by {Akl}, in braces. These symbols for planes are not t 
Tesent paper. y At, 
p considera site directions) and another single 


distributed on a sphere, what is the prob- 
. of the least angle of rota- 


ores addc di "e er incide with the reference line? It is known that 
inaron buted in the range (0, 1), but what is the 
able) sets of lines, 


s of the general reader 


for directions and planes. A particular 
w of a vector (in this direction) relative 
square brackets, is used to denote this 
A cube with its centre at the origin and 
er the 48 symmetry operations of the 
ons which are proper 


ple in the interest 


e stated as follows. Given a 


Abilic 

tio 
re : 

th Co 2 uired to make the random 

ans onis of both these angles is uni 

if instead of two single lines th 
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i ractical 
the lines of each set being fixed relative to one another? The present paper gives en 
answers to some problems of this latter type when the lines of each set are invariant u 
the rotations of the cubic group. 


2. STATEMENT OF PROBLEMS 


Consider two cubes, A and B, and imagine A to be a fixed reference cube and B mn 
initially coincident with A but free to rotate in any manner about the common cont e 
A and B. If B is rotated through an arbitrary angle about some arbitrary axis there are i 
definite rotations which will restore B into coincidence with A. These rotations are just t : s 
reverse of the original rotation taken together with the 24 proper symmetry apum 
associated with a cube having indistinguishable faces. Further, each of these 24 ur 
can be represented as a single rotation about some definite axis and through some definit 4 
angle. Then, of the 24 angles of rotation so defined, there is one (or more) which is least : 
magnitude and this least angle may be taken as a measure of the disorientation of the tw 

cubes and will be called the angle of disorientation. gt 
In 1949, F. C. Frank proposed over morning coffee the problem of determining the greate 
possible angle of disorientation of two cubes. The answer to 
squares in two dimensions is of course 45° 
obvious. However, by a tedious consider: 


for 
the analogous pom all 
; but in three dimensions the answer is not à 


2 2) = 62-80, and that = 
n is most simply described as a peer ^ 
el to a face diagonal. A more s 
of the angle of disorientation when ic 
An estimate of this probability dis 


) 


ro- 
eighbouring crystal. Then it may be asked what P i 


d 
TA "T " 
The method of calculation is described in $3 and the results are given in $4. The meth: à 


à js 8 
atrices and their testing for randomness 18 
out in $5. 


3. METHOD or CALCULATION " 
. P. 
The calculations were performed by the method of random Sampling. Since any UU 
rix (Jeffreys & Jeffreys, 1946, p. 114) 


itl each 
i Tucted as described in $4. Then with 
matrix the following calculations were made. 
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Ifa gi ri i 
given matrix represents a rotation through an angle 8 about some axis, then the trace 


of th ix (i 
e matrix (i.e. the sum of the elements in the leading diagonal) is equal to 1+2cos 0 


'Thus "s 
, to determine the angle of disorientation it is necessary, in principle, to calculate the 


trace f, F 
of each of the 24 matrices found by combining the given matrix with the 24 symmetry 
f these 24 values. However, this greatest 


ka E a cube and to choose the greatest o 
of MR ds x found easily from the following rule. Taking into account only the magnitude 
aleren ier add to the largest element the greater of the two diagonal sums of the four 
by atiaighišon ing E the same row or column as the largest element. This rule can be derived 
Pul — ut tedious consideration of all the possibilities. 
ine aad ements of the given matrix are just the cosines of the 
the old (100) directions, Min (100) is determined by the element 


KC sone in the matrix. 

Ys € E calculations are all similar and only the determination of Min (123) will 

of the 24 ( : First the matrix is used to caleulate the new directions corresponding to each 

Ifa inti sper] variants of [123] (the remaining 24 are obtained by changing all signs). 

angle men RE member of the set (123) is transformed into [u wu, us] then the cosine of the 
een this direction and the nearest direction of the set (123) is 


s Ug | +2 | Un | +3 | u|) 

middle and largest of thy, Ug and us. The 
Min (123). A table showing the bounds 
t angular deviations from [123] 


spection; the cosines were only 


9 angles between the 
of largest 


Wher 
e . £ 
Ugs Um and w are the numerically smallest, 1 


greate 

s , : à 

for | t of the 24 cosines so calculated determines 

u E * A " 

Was ds |; [ttm | and |24| consistent with a series of differen 
ed to reject most of the 24 possibilities by visual in 


co 
T ee eiie for the few remaining cases. T . 
m dis , ie hen the results of the calculations had been accumulated for the 150 matrices, 
and the er of cases in which the various angles lay in suitably chosen ranges were counted 
numbers so obtained used as estimates of the corresponding frequencies. 


DISCUSSION 
f histograms in Fig. 1. The ordinates have 
nit of measurement along the 


The 4. RESULTS AND 
“lease ate presented in the form of a series € 
n normalized to represent probability densities when the u 
ie is 1°, and the figures along the top of each histogram are the actual number of cases 
ELS in the indicated range. If P is the estimated probability of an angle lying in à 
leular range, then an estimate of the standard error of p based on à sample of 150 is 


PIL > : "ab 

be low P)/150]3, and horizontal dotted lines have been drawn jong peme may jx ide and 
the top of each rectangle of the histograms: The mean 7 and the $' andar eviation s 

© estimated distribution are given on each histogram and are also indicated by means 

e arro ach histogram. 

panduangen the bottom of e am give an indication of the form of each 

ach curve is unity. 


he d aue 
fr otted cur d on each histogt 

curves superpose justed go that the a 3 

an 


Quenec 
functi b 
Xe y function. These have been 2 
reape gry ad eens on È Dy D 
the c j istence of à à 
o ase of Min (110) the exis ES : 
in Wever, there are fae uA divisions of the ranges: indications of more than one m 
tr Some of the other f functions an the nature of the problem suggests that the 
i: edhen other Tregueney ist of a nu mber of continuous arcs which join at sharp 
Ther. cy functions may con? 


rea under € 
ave a single maximum, 
o means certain. 


208 Random disorientation of cubes 


As was mentioned in the introduction, a tedious argument shows that the greatest 
possible value for the angle of disorientation is 62-80°. A similar argument shows that the 
greatest possible value of Min (100) is arcos $ = 48-19° and that the rotation which achieves 
the corresponding disorientation is most simply described as a rotation of 60° about an 
axis (111). Although no further results of this type are known to the authors it is easily 


a 0$ 21 36 so; 35 
ž= 4° prese 
sz119 


0-63 0:03 
0-02 002 
0:01 0:01 7 
oe mn a .00 É H r 
09 109 209 309 409 509 60° 09 109 205 309 405 30° 


Angle of disorientation 


Min 4100» 


0:00 


0° 4? 89 12916920? 24? 289322 "oe 
Min 410» 


Be 
29 4? 6° go 100 
Min 4123» 


09 2° 49 6° $89 109 129 149 0-0 
Min <112> 


ec 
39 49 5o 
Min [ 61105, 61125 , (123 5] 
Fig. 1. Histograms derived from a random sample of 150. The ordinate i ays 
T s a 
angles are measured in degrees. The figures at the top are the Md egeta density when n: 
range while the horizontal dotted lines indicate limits corresponding to o; cani donte: pes m 
the estimated total probability for each range. The mean Z and the edi - Bunt STOP rus 
estimated distribution are indieated by the arrow and range at the bott E deviation kb he 
maximum value z,, of each variable is also given. The dotted curves ate om. An estimate of th | 
each frequeney function. 1cate the general shape 9 


= 
4° o 


shown that a rotation of 45? about an axis (100) gives Min (110) = arcos 1(2 2 1:407 
and the results given in Fig. 1 suggest that this is probably its greatest = in. ) F 3 ms 
values, &m of the greatest possible values of each variable are given in Ba ag an n 
value is associated with limits of error it has been estimated as follows g. 1 and where 

A rotation is uniquely defined by three suitable independent y ; 
angles) so that the variation of Min (1105, say, can be represe: 
dimensional hypersurface. Now experience obtained in the cal 
values of the angle of disorientation and Min (1005 suggests that 


ariables (e.g. the Eulerian 
nted by means of a four- 
culation of the maximum 
in all cases the appropriate 


‘op 


MW 


» 
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hypersurfz i i 
ypersurface can be approximated in the neighbourhood of x = 2; by a number of hyper. 
in “m D d 2i 
iA i s ai" at a common point. Thus, when v is sufficiently near Xm the prob- 
À asity would be expected to be proportional to (z,,—2)*. Except for the angle of 


isorientation and Min (100) this expected behaviour can be roughly verified from Fig. 1 
g. 


and x, has ims z ; : 
., has been estimated by plotting p? against the mean value of z for the range concerned. 
esses and the maximum 


= — of error stated in Fig. 1 are no more than reasonable gu 
30 + 1? given for Min (110) is to be compared wi 
ared with the known r i i 
i-i aded I wn result that the maximum 1s 
E ics argument accounts for the initial linear rise of the frequency functions for all 
? 1 Sj i Y 
kde in (uw) in Fig. 1. If the set Quvw) has 2n members defining n distinct lines Min (www) 
á che least of the n? angles 0; between pairs of lines and cos 0; is uniformly distributed on 
1e range (0, 1). Now the method of inclusion and exclusion shows that Pr (Min (uvw) < 0) 


lies between the limits* Y) Pr(0; «0) and X Pr(0;«0)— X. Pr (0; « 0,0; « 0). Thus, for 0 
i i i<j 


planes all intersecting 


sufficiently small ; 
^ Pr (Min (uvw) < 0) = n? Pr (0; «0) = n*(1— cos 0). 
and the estimated frequency functions in 


The corresponding density function is 2? sin 0, 
o an angle of about 24/n degrees. 


Fig. 1 are in substantial agreement with this prediction up t 


ONAL MATRICES 


sive columns of an orthogonal 3 x 3 matrix can be regarded as 
] unit vectors, à random orthogonal matrix can be 
om unit vector X and write its components as the first 
which is independent of x. These two vectors 
re is a unit vector y perpendicular to x. The 
third column consists of the components 
Thus, the problem is reduced to that of 


5. CALCULATION OF RANDOM ORTHOG! 


Since the elements in succes 
the components of three orthogonal 
constructed as follows. Choose a rand 
column. Choose a second random vector y' 
define a random plane and in this plane the 
components of y form the second column while the 
of x xy — z which is normal to the random plane. 
computing the components of a random unit vector. 


Let a, aa, vg be three in | deviates with joint probability density 


dependent unit norma 
(2n)-Bexp (- 3229). 

ce of the sphere Xa? = constant, 80 that given the value 
f the surface of the sphere 


)liesin any area o 
e vector [£i Ta 3] is distributed 


1 the surfa 


that the point (#1, a: Y3 
s, the direction ofth 


This density is constant or 
of S = Ez? the probability 
is simply proportional to that area. Thu 
uni 

niformly and x = [2p te 23]/5* 


andom unit vector 


i, 


is a random unit vector. 


Similarly, we can find another independent t: 


y = [Yi Yo: Ys) 


where 7 = Xy£. Then, if P = X-Y^ 

y = [Su Px, Sys — Px, 
quire 
tes were taker 


_ Pa eis T -P 
ows by computing Z = XX y. 
m the tables of Mahalanobis, 


Sys 


d matrix foll: 


column of the re 
n fro 


Finally, the remaining 
devia 


The values of the random normal 
he Department of Statistics, University of Melbourne. 
Biom. 44 


* This remark is due to Dr H. A. David of t 


14 
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Bose, Ray & Banerji (1934) and as a check on the overall accuracy of the calculations the 
em sums c4, c and c4 were formed and it was verified that 


óc = 3. 


The standard deviation of divergences from this equality due to rounding off errors is about 
2 units i re retained in the matrix elements. 

j pera eran z,[S* and of the column sums c are closely related to the Paiebuteat 
and in three dimensions are both distributed uniformly* (Cramer, 1946, pp. 240, es 
Thus, although they are not independent, all nine elements of a random orthogonal matrix 
are uniformly distributed on the range (— 1, 1) while the column sums are uniformly dis- 
tributed on the range (— 3, 43). 

The 150 matrices were tested for deviations from these predictions by dividing the range 
of each variable into 10 equal parts and testing for uniformity of distribution by means of 
a x*-test with 9 degrees of freedom. The greatest value of X? obtained from the elements 
was 15-7 and for the column sums 22-6, while the corresponding mean values of y? were 9:4 
and 16-9. However, after permuting the columns in all the six possible ways, t these maxima 


dropped to 13-1 and 15-9 respectively while the corresponding means were 9:7 and 12-4. 
Thus there was then no significant deviation from uniformity of distribution at the 5% 
level (y? = 16-9). 


The following four rotation matrices are typical of those computed by the above method: 


0:8527 0:4846 0-1953 0:2294 — 0-6454 — 0:7286 
0-2780 — 0:7374 0-6155 0-9035 0-4196 — 0:0872 
0:4423 — 0:4705 — 0:7636 0-3620 — 0-6383 0-6794 
0:0443 — 0-9973 0-0584 — 0:3763 0:5764 0:7254 
0:9773 0:0554 0-2045 0:7487 0:6504 — 0:1285 
— 0-2072 0-0480 0-9771 — 0-5458 0-4947 — 0:6763 
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gle between a fixed dir 
tributed on the range (—1, 1). Hence, if £ the cosine of the co 
range (— 1, 1) and the longitude ¢ at random in the range (— 7 


ection and a random direction is uniformly dis- 
-latitude is chosen at random in = 
jT) 

[.— £y cos d, (1— £2) sin 9, E], 
is a random unit vector; the sign of the square root is take: 


à n positivel: 
method of calculation of a random uni 


his 
ae Y and negatively at random. T 
A - t vector is suitable for high speed Fo mad à 
is removes any bias due to calculati > ivi i ds 
T y ulating the successive columns in a definite order. 
(My 
i 


D 
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THE DIFFERENCE BETWEEN CONSECUTIVE MEMBERS OF A 
SERIES OF RANDOM VARIABLES ARRANGED IN 
ORDER OF SIZE 


By J. H. DARWIN 


Applied Mathematics Laboratory, Department of Scientific and Industrial Research, 
Wellington, New Zealand 


1. INTRODUCTION AND SUMMARY 


Very commonly in analyses of variance a table of, say, treatment means is set out in order 
of their size and the experimenter desires to find if there is a significant break between 
successive treatment means. The theoretical problem of finding the distribution of the 
difference between the mth and (m+ 1)th in a descending series of N from a known dis- 
tribution dF'(x) readily reduces to the evaluation of an integral. For few distribution func- 
tions F(a) can this integral be computed in terms of known tabulated functions. The 
consequent numerical evaluation is often so tedious that special sorts of comparisons 
between treatment means have been suggested (e.g. see Tukey (1949), when F(x) is the 


normal distribution function with an unknown variance). 
In this note we find by the saddle-point method an approximate formula for the pro- 


bability of a greater difference than w between the mth highest and the (m+ 1)th highest 
of a sample of N drawn from dF (x). This method was first used for approximation on the 


real axis by Laplace (1820), and, in work similar to ours but on the range of the sample, by 
find when F(x) is the normal integral with known variance, 


Cox (1948). The probability we Lesnar 
is compared with that obtained by use of the limiting distribution of the mth and (m + 1)th 
satisfying certain conditions. 


as deduced by Gumbel (1935) à , 
Numerical calculation for the *unstudentized" normal case are compared with previous 


calculations of the same kind done by Irwin (1925) and the extension to the ‘studentized’ 


normal case is given without calculations. 


for distributions dF (x) 


-POINT APPROXIMATION 
arranged in descending order of size; suppose 
obability that tm — £1 is greater than 


2. THE SADDLE 


Zos «e UN 


Suppose we have N sample values 25; 
. Then the pr 


the parent distribution of the x; is dF (x) 
(A is 
Pu) = mn] * Fete) Qe) aF). 
1%) = m! (N-m- 1): 


e saddle-point metho! 
ted to gt 


a) 


-o 


d on this integral we set m = Np and 
su — ve reasonable results if p is near 4 and 
N <n As DUC LAE values of N and m. The quer of z 
oi adiidpr coc 1, on the relative values o: 
approximation will then depend, especially when P" shall find in special cases. 
the first and second approximations, both of which we p 
The integrand is of the form s 
)- F(x), say. (2) 
DJP) = exp (NG) f : 
1—p) log F(a) FG». JR 


exp (N[plog (1 — (x 4-2) *( 


212 Difference between random variables arranged in order of size 


We suppose F(z) has continuous derivatives of the first four orders everywhere in its range. 
Any stationary value of G(x) is a solution for x of the equation 


aieia U- (3) 

1—F(x+u) F(x) 
Suppose there is a unique root at 2. A useful approximation to it is 

R= z, HAUHAU? H.. 
where F(x) = 1—p, 

a, = —(1—p) [1 + pf (0) G,)Y]. 

and — (f(y)? +f" Ep) p — p) (a + 4) +f (Gp) f (Ep) Bai + (2— p) a, + (1 — p)) = 0. 
The first approximation to P(u) is then 


N! 2m (1 PG + uy) (P(A (9) 


Ne KD) 
m\(N—m—1)! f@+u) Ww f+) 2)\2 (2)! 
(ea Tera) +m; - Fera) +(N—m) (52) (N -m| 
where f'(£) and f'(@+u) = of[0x calculated at ? and 2--w. Suppose this is written P,(u) 
Then the next approximation to P(u) is P,(u)(1+a/N), say, where a is 
-5(G^(e) GIA) pF (2) myn 5 
ACO * Gaye " (8 (8) (8) Av io Xi» X5)- (5) 


In this X and Xf? are respectively the coefficients of p and 1— p in G(x), the ith 
partial derivative of G(x) with respect to x which is set equal to 2 after differentiation. 


For a given F, a is a function of p and u onl; d i ivati 
> y and can be calcul. i rivatives 
have been tabulated. ead Pads desires 


3. THE NORMAL INTEGRAL 


For applications the most important function F (x) is probably the normal integral 


1 E 


We consider first that c is one. For this distribution f(a) |F 


" 25 3 i reasing 
function of x for all real æ. This is clearly true for zz 0. Fo d ii» mungiumia dipofe 


has the sign of T <0 the derivative of f(x) ^ (a) 
x = a ad 2 
-[ (we Ela) da — e-t? = af? = dx « 0. (6) 
Hence as 


fecu) — PG--u)) =f- (4-0) 


it follows that f(x)/F(x)/f(w+u)/(1 — F(z--u)) i 

(LP S a monotonic d. i i x, Tt 
decreases from infinity to 0 as x increases from minus to plu i Freies E 2 is? 
unique real root of (3). ene Senne 


By virtue of (6), G"(2) is less than 0 for all posit i 
? positive u, -point 
method is the real axis from minus to plus infinit ung iim 


^ y. Sine i í t 0 
&(p)+2(1—p) = —u, where the dependence of ? [n ide peace qe. 


; n p is expressed by writing 2 as ap) 
It follows easily that a(p) = a(1—p), where @(p) expresses a as a ind p- Some 


F(— (z4-u)) 


1 


p 
ti 
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calculations have be 
re been made from the formula (5) f 
p ae ; e mula (5) for values of p from 0-01 to 0- 
Wen fo 2 ma Log: the 5 % and 1 % significance points for N up to 10 = ibe ri € 
ud. aw (to judge from the approximation in $4); ma/N turns out to va " ord = 
or a particular value of u and m/N varies from 1/N to 3. The Hee aR 
}. ues o: 


ma[N are given in Table 1. 
Table 1. Values of ma/N 


Nl 

N 0-01 0-02 0-05 0-1 

PN 5 0 0-20 0-30 0-50 
0-1 0-0805 — — 0-0800 

oa oe = 0:0762 0:0758 0:0773 0-0819 a 
o5 : 697 —. a 0-0653 0-0646 0-0670 0-0863 

0-0597 0-0575 0-0547 0-0518 0-0489 0-0488 

2-0 0-0450 0-0425 0-0387 -0345 7 90821 
25 387 0-0345 0-0297 0-0275 0-0321 
3-0 0-0362 0-0330 0-0286 0-0243 0-0193 0-0165 0-0175 


We could use either of the equations 


P (u) = 0:05 (or 0-01), (7) 


Pu) (1 +a/N) = 0-05 (or 0:01), (8) 


ance points wu. Naturally, one expects (8) to yield values of 
points. The evidence that we have for the accuracy of the 
points given by (7) and (8) is a comparison of the probabilities Pj(u) and P (u) (1-- a| N) 
with the probabilities for two distributions worked out by Irwin (1925). These distributions 
were of the difference between the two largest observations in samples of sizes 3 and 10. 


The comparison is given in Table 2. 


or 


to give approximate signifie 
u closer to the true significance 


th Irwin’s exact values 


Table 2. Comparison of approximations wii 
TT T i 
wal OL | 02 0:3 04 | 05 0-7 1-0 $5 - 
| I 
| | 
Irwin's values | 0-917 0.836 | 0760 | 0-687 — | 0-493 | 0339 | 0-069 | 0-008 
N=3 iPQ(u) 0:8389 | 07699 | 0-7031 0.6389 | — | 046049 | 03228 | 0:0672 | 0-0079 
P,(u) (1+a/N) 0.9148 | 0:8348 0:7584 | 0.6857 x 0-4928 | 0-3388 | 0-0691 | 0:0081 
Irwin's values 0-855 0-727 0-613 — 0-427 — 0-152 | 0-011 Big 
N= 10 {P,(w) 0.7904 | 0-6734 | 05698 | — 0.3994 | — | 01450| 00110; — 
P (u) +a) 0.8536 | 0:7245 | 06108 | — 04958| — | 01525 | 0-0113 | — 
| — 
pectively to a low value 


One might expect 


of N and a low value of p; 
over the whole range 


f u sufficiently 
y close to thes 
laces for N 


Piu) (1+a/N) 


(8) will give values 0 


and that we will be ver 
ficance points correct totwop 


that 


from (7) are given in brackets. 


these distributions, 
would provide a $ 


of and 
correct 


evere tes 
of P,(w) for the higher 
purposes 
(7). We give i 


for practical 


e values if we use 
from 3 to 10. deduce 


corresponding res 
t of our approxim 


ation. The success of 
values of u suggests that 
for N between 3 and 10. 
n Table 3 the signi- 
d from (8). The figures deduced 
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H ^L. r f 

The solution of (3) necessary to give the tables was made from the Biometi ika na E 
the normal integral and frequency function, by prospecting till ĉ was contained betw 2 W 

neighbouring two-decimal values of the variable. Linear inverse interpolation between these 


values, at least for 2 up to 2-5, gave a solution ĉ which appeared to be correct to 5 decimal 
places. 


Table 3. Significance points for u = ,,—2,.4 
zl | | | 
IN | 1 2 3 4 
N NI 


3 1% | 291 (2-90) | 
5% | 2217 (2-16) | | 

4 1% | 260 (2:59) 2-17 (2:17) 

5% 1-92 (1-90) 1-58 (1:58) 

5 1% 2:42 (2-41) 1-87 (1-86) 
596 1-77 (1-76) 1-34 (1-34) 


e 


6 | 1% | 230(229) | 1-69 (1-69) | 1-56 (1-50) 
5% 1-67 (1-66) 1-21 (1-20) 1:11 (1-11) 
7 | 1% | 221 (2-20) | 1-58 (1-57) | 139 (1-39) 
5% | 160 (1-59) | 1-12 (1-11) | 0-98 (0-98) 


8 | 1% | 215 (213) | 1-49 (1-49) | 1-28 (1-28) | 133 (1-22) 
5% | 155 (1-53) | 1-06 (1-05) | 0-90 (0-89) | 0-86 (0-85) 
9 | 1% | 209(208) | 143(142) | 120(1-20) | 112 (1:11) 
5% | 150(149) | 1-01 (1:00) | 0-84 (0-83) | Q.7g (0:77) 
10 | 1% | 204 (203) | 138 (1-37) 


1-14 (1-14) 
5% 1-46 (1-45) 0-97 (0-96) 0-79 (0-79) 
| 


1-04 (1-04) 1-01 (1-01) 
0-72 (0-72) | 0-70 (0-70) 


- 


i -decimal values of u for which P,(u) 
lay just above and just below 0-05 or 0-01. From these we could get the figure in brackets 
for the rounding off to 2 places, P,(~) 
for values of u between these was computed. 


two decimal places, 
resolved by the computation of alN whi Y aee 
obtained from the graph. 


k 3:1. The studentized test 
In the application of Table 3 to testin 


g the differences betwe i in an 
s , €n successive means in. 
analysis of variance we should use means standardized by division siga 


-Point method to provide ? 
m41)/5 iS greater than u. In this case %m 
bservations, and s? is an estimate of the erro" 


k N E —G. 
is the mth highest of N means, each of n o 
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variance, with vs? having a x? distributi i 
aving a x? distribution with v degree: i 
may read this series from Hartley (1944). prd van con is: 
We have 


Prob (Jn (z,, — 2p 44) > us) 


l 1l fut... 3 2 
= Plu H5 u? P" —uP’ ce iy UO WW uus E e 

(u) >| uP) +5 a; a id geltek (9) 

where po z (ajau) Pta) and P(u) is as in (1). The derivatives of P(x) may be obtained from 
5 by differentiation under the integral sign. Each ratio P?/P may then be approximated 
y ihe method of$2 by a power series in f(? +u)/(1 — F (ê+ u)), where 2 is again the solution 
of (3) and P(u) is approximated by P,(w). No calculations have been made for the studentized 


case, so that the accuracy of this last suggested approximation is unknown. 
When there is not enough previous experience for ø to be well enough known to be 
assumed constant a conservative procedure will be to use, not s, but an a confidence 
limit for ø. Thus if we take, say, 0, as a 95 % upper confidence limit for ø 19 times out of 20 
when we make the statement—the probability of getting a value of u larger than 
Y N (Xn — 2, ,4)/0, is less than (1)—we shall be right. However, this use of Table 3 will not 
give much discrimination between means for low v, and its chief advantage over the use 
of significant differences obtained from the distribution is that it does produce the correct 


pattern of the strong dependence of the test values on m. 


4. GUMBEL’S ASYMPTOTIC THEORY 
An interesting comparison is available with the asymptotic theory of the distribution of 
the mth value developed by Gumbel ( 1935). He discusses distributions of what he calls 
exponential type. For F(x) to be of this type we must have approximately for large a 


fe fe 
—i-F() F(z)’ 
(fo) (10) 


fi(a)=(-) OFE 
a<0,x>b). Tt is not to be expected that it will 
be a good approximation for the whole range of x for à distribution as different from the 
exponential as is the normal. However, for our problem the whole range of x is not so 
important in the evaluation of (1) as is the range around the expected values of the mth 
and (m+ 1)th highest values. Gumbel sets F (Um) = 1—m[N and expands F(x) about tm 


in a Taylor’s series, getting, by virtue of (10), 
(x = Um) f) N/m). 


o F(a) for it to be used in the integration 
han u is approximately 


and 


(10) is satisfied by 1 — F(x) = exp a(x —b) ( 


F(r)e1- gexpL- (11) 
Su is i roximation t 
Ppose this is a good enough approxim peur 


leading to (1). Then the probability that tm m+ 
exp j= Nuf(u))- 


ulated from (1 2) 


(12) 
for the normal distribution 


We give in Table 4 the percentage points calc 
and the samo cases as those of Table 3. 
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ivéejindlaimine-aipniticance, 
It appears that for the normal distribution (12) is conservative in claiming significance 
but is not accurate enough for ordinary use. If F(x) = 1 —exp(—2z), (12) is exact. 


Table 4. Significance points for &m— mı given by Gumbel’s theory 


m 1 2 3 4 5 
N = = 
3 1% 4-22 
5% 2-15 
4 1% 3-62 2:89 | 
5% 2-36 1-88 
5 195 3-29 2:38 
596 2-14 1:55 
6 1% 3-07 241 1:92 
596 2-00 1:37 1-25 
7 1% 2-92 1-94 1-68 
5% 1-90 1-26 1-09 | 
8 1% 2-80 1-81 1-52 1:44 
5% 1-82 1-18 0-99 0-94 
9 196 2-70 1-72 1-41 1:30 
596 1-76 1-12 0-92 0-84 
10 1% 2-62 1-64 1-32 1:19 1:15 
5% 1-71 1:07 0-86 0-78 0:75 


5. OTHER DISTRIBUTION FUN 
The exponential distribution 
known probability (12), 


CTIONS 


provides a comparison of the theoretic 


al probability with the 
which is exp (— amu). The approximation 


by P (u) is 
N m4 N-m+-} 
wn JQzN) pa —py exp ( — amu). 


This coincides with the correct value if the first term of Stirling’s approximation to the 
gamma function is used for the three terms of the binomial Coefficient. The second approxi- 
mation is (13) multiplied by 


(13) 


a(p) p'—pel 

HW CU gs us (14) 
balancing the second term in Stirlin 
Again a(p) = a(1— p), so that the correction is stil] symmetrical about p=, although 
the main term exp ( — amu) is not symmetrical about m = LN. alp) IN is still small, achieving 
its highest value 44 for p = 1/N and N large. (A similar analysis can be done whe? 
F(x) = exp (—exp(—azx)).) A test that is independent of the value of æ is possible if an 
independent estimate of 1/o is available from, say, the average of r independent variables 

drawn from the distribution, The sum of r such 


DEM values, X, say, when multiplied by æ h^? 
the distribution (1/ T(r)) exp ( —&)27-! dx and the probability that (a — x |X is greater 
than w is ^ 


(15) 


£'s approximation, 


mni) 
(L+mu)-, 


J. H. DARWIN um 


More usually no such indepen ent sample w] e av S T 
ally no su h ind pi dent pl ill b i i 
"ndi SES al able, and it is perhap: more appro 
priate then to test the ratio Wi] igs for = 1. Instead of equation (3) we have the ti à 
E equation 


pala) | -pfa 

i-F(ex)~ Fe’ m 

when we are trying to find the probability that 2;, is greater than cz,,.,. This is satisf : 
"marl: isfie 


by exp(—2) = pe|(pe+1—p). The first approximation to the probability is 


N " 
(m | V) aen (1 — py (pe + — p) na (17) 


and the correction factor is — 1+ a =p + galpe— 14) 
12Ncp(1— p) (pc 1—p)' (18) 


Where a(p, c) = ca(1— p, 1/c). The biggest value of a/N is again 4. The true value for P(u) is 


AN [(N+me—m 
m mc A (19) 
and the correction factor balances the second term in Stirling's approximation to the terms 


of the denominator of (19). 
53. The rectangular distribution 


Another distribution that provides a check on the accuracy of P,(u) and P(u) (1 +a/N) 


is the rectangular distribution, F(x) = v in (0, 1). Then P,(w) is 
7 
b ) JGzN) pa -pym (1 - uy, 
m 


a(p | ! -ptpr 
N 12Np(1—p) 
N, Thus for both exponential and rectangular distributions 
ed for the normal distribution the correction factor is less 


(20) 


and as in (14) 


The true value of P(u) is (1 —w) 
and for the range of u investigat 
than 4... 

6. RENORMALIZATION 
are the above results with those obtained by 
). We renormalize P,(u) by dividing by 


As suggested by the referee, we can comp: 
)/N tends to 


see Cox, 1948; Daniels, 1956 
= ] — p, and, as 4 tends to 0, a(u 


(1 — p p2/ü2Np( —2) 


Tenormalizing P, (u) ( 
P,(0). When u = 0, F(@) 


which agrees with 1/P,(0) to order 1/N. 
P,(0) = (7, vem pet =p) 
m 
In general, a(u)/N seems to decrease as u increases. Hence use of P()/F,(0) instead of 
Du) +a/N) will tend to overestimate the probability of a bigger difference than u. 
This difference between the two, however; is likely to be numerically — serious for P, (u) 
near 3 than for P,(u) near 0, since when P,() tends to zero the tae & cong zero. 
This is illustrated by the values of P,(u)| (0) for the same values of u as in Table 2: 
4 07730 0-7024 0-5111 03549 0-0739 0-0087 
5 0:1574 0:0119 


N-m4, (21) 


N=3: 09223 0-846 
N=10: 08579 0.7309 0-6185 0-433 
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The pattern of the difference for m+ 1 can be gauged from the sizes of the factors | + alN 
and 1/P,(0) for N = 20. These are: 


1-a]|N from u = 0:2 to u = 3-0 decreases 1P 
for m = 1 from 1:0762 to 1-0286 1-0847 
m = 2 from 1:0379 to 1-0122 1-0427 
m = 4 from 1:0193 to 1:0048 1-0221 
m = 6 from 1:0137 to 1:0028 1:0158 
m = 10 from 1:0108 to 1-0018 1-0126 


The biggest differences between the factors occur for low values of m and high values 
of u. For higher u, near the significance points, there is little to choose between Pu); 
P,(u) (1 - a[N) and P,(u)/P,(0), since the difference between any pair is, for example, of 


order (0:05) (1 4- 45) for the 5 % point; but for low values of u the second two are appreciably 
more accurate than the first. 


My thanks are due to Miss Mary Chung for help with the calculations. 
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RELATION BETWEEN THE DISTRIBUTIONS OF NON-CENTRAL £ 
AND OF A TRANSFORMED CORRELATION COEFFICIENT 


By B. I. HARLEY 
University College, London 


l. Tfr is the correlation coefficient in a sample of size n drawn randomly from a normal 
bivariate population with zero correlation, then the quantity (n— 2) r[(1 — 72) is known 
to be distributed exactly as Student's t with (n— 2) degrees of freedom. 


By a comparison of moments we shall show that if the population correlation, p, is not 


zero, then ; 
- — 9X gípl, 
prm (n2 gto) 


p; is distributed approximately as non- 


function of 
rees of freedom as that of 


where g{p}, an appropriately chosen 
f non-central t having f deg 


central t. We define the distribution o 

240 

t= ma 
where z is a unit normal variable having zero expectation, w is an independent variable 
distributed as y2/f with f degrees of freedom and à is the non-central parameter. We shall 
suggest a form for g(p) and determine appropriate relations between n and p on the one hand 
and f and 6 on the other. Finally, some numerical comparisons vill be given and a method 
indicated whereby the approximate equivalence of the distributions may be used to supple- 

ment the tables of the non-central t distribution given by Johnson & Welch (1939). 

(1 —72)5 when p= 0. The prob- 


2. Our first objective will be to obtain the moments of r/ 
Ability distribution of r is known to be 
d» (eos (— 7) 
3 (1) 


1-0? 4(n-1) aie 
pr | np) = CA (1 -r je arp)" (1 —nph 


] d^ (= ( 2d by fp) 
By integrating over the range of values of and denoting 555 | (1— rip?) y P. 
n(n—3)!_ _ O(n,p). 


we find that m Ema 
Í m. —aeye-o fr-strp) dr = Lg 
Differentiating with respect to P gives 
n—1)m(n—3)!P _ Qn, p), 


dn-D 


D = 
[1a [rei ^ apo) 


m 212-.- 
Where Q"(n, p) = Da (n, p) for m 


dr = Cn, p) 


Hence 


ie mcr a= poe» P769) 
Ea —Qn 
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If we denote the expectation in a sample of size I: by &,, where I: ean take integral values, then 


" T +1 r (1 —72)kn—s) TETA . 
Salaa) Í a0- Cat, pj OPd 
_ Cl(n,p) 
~ €(n +1, p) 


If (n+ 1) is replaced by n, then 


INE .. Mead 3) 
5s) C(np) ` 


To obtain the second moment we consider 


+19 


D 2 T D r 2 
iaai a [^ E osegenio 


+1 2 
= (55 0 779-0 2-5) a 


2 
= C(n, p) & (i) 
„2 12, 
Hence e ( : ) = Cn-2,p) 4 
lS C(n, p) e 
In general we h € (c - C"(n —m, p) 
g e have "Ia OR) (5) 


since differentiating with respect to p results in effect, to iplyi i ing 
multiplying by r, and increasing 
the power of the differential by one. atic 


From equation (5) the first four moments about the origin of 


—-(n-24. " — 
v= (n—2) asilo) 
are found to be 


A. 20-233 p 
H8 edil = (n—3) -pA " 
4o) = (o) «(72 (, , (n- 1)ge 
Halv) = 6, (v2) = (ca * (=p istop, (7) 
EA isola (n— 2) p 2 
L-d = zia EY 
HO) o) = pio («qe [v (p). (8) 


340D (m-i) 


ah (n— 2) 
talo) = e (v4) = — i 
' ( ü-p) T m ) lgo}. (2) 


(n—4) (n — 6) 


since they are not necessary for 


mean and -ratios, but 
the subsequent work, they threrbeta nati 
to their length. 


Y are not reproduced here owing 
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3. Johnson & Welch gave the first three moments of the non-central ¢ distribution, and 


since w and z are independent it is a straightforward matter to find the fourth moment and 


the beta-ratios. 
We find that, for the moments about t = 0, 


2 Brar-s 
pt) -5 "e. (10) 


ptt) = 09) (11) 
ł —3 

pat) DITS S so em, (12) 

f 3 4- 60? 4- 04), (13) 


KO = Fy g-a' 


and from these equations we obtained the moments about the mean and the beta-ratios. 


1 —72)* and of non-central ¢ into correspondence we 
and to relate n and p to fand 6. Tt is clear that 
ber of ways. After various methods had been examined that 
y because it led to simple relations which did not involve 
also because it appeared to provide satis- 


4. To bring the distributions of r[( 


have to specify a form for the function g{p} 


the link could be made in a num 
described below was chosen, partl 


any approximation for the Gamma function and 
factory results in the cases where numerical comparisons were made. 
12) we have 


From equations (10) and ( 


mO Í (8+8), 14 
a al 
and from equations (6) and (8) it follows that 
; iu. EL) f 
EO = AP (8) (aa) x 
Equating these moment ratios gives 
_2) a) 34-82). (16) 
tor = (ora 9-9 i 
Next equating the values of pi (v) and p(t) given in equations (7) and (11) we have 
(n-2)f, 708) = A aem (17) 
uar 4 ( 2 e) Ge s; 
= 0, then (16) and 


ives exact correspondence when p 


If we take f=n- 2,a result which g 


(17) are satisfied when ae 
P 2(1—") i 
= (mop) and g{P}= (555) š (18) 
2—p 
of the two corresponding 


the moments 
sults given in Table 1 suggest that the 


oments about zero; 
vertheless be quite good. 


tly, but some re 
the other may ne 


Apart from the second m 
distributions will not agree exae 
representation of one distribution by 
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Table 1. Comparison of the values of the first two moments and VA, and fy of v and t 


| | s lu 
p | n è | e) | nO | un) | patty | AQ) VAQ| Bole) | Balt) 
| | | | i | 
| |- | | 
| | 7386 | 3.7352 
0.2 | 15 | 0-742 | 0/7891 | 0-7889 | 1-2103 | 1.2107 | 0-2150 | 0-2139 | 3-7386 3 T 
02 | 27 | 1-020 | 1:0522 | 1.0521 | 11111 | 11113 031361 | 0-1356 | 33121 | 3 
| | | s 
0-6 | 15 | 2-435 | 9-588 | 2-587 1-489 1493 | 0-631 0-615 4300 4 Ma 
0-6 | 27 3-346 | 3-451 3-450 1:346 1-353 0-404 0-369 3-556 627 
| 


5. Johnson & Welch considered 
where ô and f are assumed known 


S probability levels c given. i 
; could be made using the norme 
Ve à very good approximation 


n from a norma] bivariate population has 
lues of m = 3 (1) 25, 50, 100, 200, 400 and 
P = 0-0 (0-1) 0-9, and thus by using the approximate connexion between r and t given in 
$4 we can obtain values of ta given é and f for any probability level and not only those given 
in Johnson & Welch’s tables. y 


The method of calculation of [3 
(i) Using the relationship bet; 


y F. N. David (1938) for va 


» given à, f and e, is ag follows: 


ween ô and p given in equation (18) we have that 


2 282 E 
p= 2n—3+4 62) > 


where n = f+ 2, and where from a consideration of the values of /4(v) and /44(t) we note that 
p and ô have the same sign. , 

(ii) Using this value of p, and taking n = f4 2, a value T, can be found from David’s 
tables such that Pr>r) =e. 


(iii) Finally, t, is calculated from the relationship 


- (zo ze "n 


dou a —52 
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Table 2. Comparison of the values of t, calculated by three 
different methods at various probability levels 


] T 
Values of t, 
] 
f ò p € | From 
zac | From T | Johnson & 
approximation, Welch's first 
| approximation 
7 0-553 0-2 0:4 0-843 0-843 0-813 
0-2 1-512 1:513 1:457 
0-05 2-621 2-623 2-542 
0-01 3-90 3:90 3-93 
, 
1:815 0-6 0-4 2:190 2-188 2-105 
0-2 2-992 2-992 2.876 
0-05 4-434 4-436 4-331 
0-01 6-20 6-20 6:45 
Si . 04 3-105 3-100 2-981 
as p^ 0-2 4-030 4-026 3:867 
0-05 5:739 57741 5-628 
0-01 7:87 7:89 8:34 
: 3-699 3-688 3:544 
2196 92 be 4-708 4-701 4-512 
0-05 6-603 6-608 6-488 
ni 8-99 9-01 9-60 
se 
1:096 1-096 1-079 
16 0-821 0-2 oe prm) | gaat 1-699 
Dit 2.074 2-674 2-635 
"0o ; f 
0-01 3-60 $60 N 3:57 
-029 3:027 2.978 
2.691 0-6 04 nn 3-758 3-697 
é 0-2 490 | 492 4-860 
0-05 6-12 | 6-12 6:12 
0-01 
A 4-258 
4.334 4-330 
3-941 0-8 oF 5-159 5:156 a 
E 6-502 6-509 Tak 
0-05 7.91 7-94 7:95 
0-01 
ve 
5173 5153 Aa 
; 0-4 6-063 5 
ame ye 0-2 ed 7546 7.462 
7-538 
0-05 9-10 9-12 9:17 
0-01 
, 


224 Transformed correlation coefficient 


Table IV agree with those calculated using the r approximation, to this accuracy at ets 
in all cases where p = 0-2 and 0-6. For p = 0-8 and 0-9 the agreement is not quite as good; 
but for e — 0-4, 0-2 and 0-05 there is still only a difference of at most one unit in the third 
significant figure between the exact value and that from the r approximation. Since J —— 
& Welch give fewer decimal places for e = 0-01, one would expect that for large values 0 
6 the third significant figure may be in error by one or two units which may account for the 
seemingly larger discrepancy between the first two methods. 

In most cases the method using the probability integral appears to be more accurate 
than Johnson & Welch’s first approximation given in the last column of Table 2. Certainly 
these results suggest that the probability integral of r could be used to 


obtain reasonably 
accurate values of t, for those values of e not given in Johnson & Welch' 


s Table IV. 
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ON THE SOLUTION OF ESTIMATING EQUATIONS FOR 
TRUNCATED AND CENSORED SAMPLES FROM 
NORMAL POPULATIONS* 


By A. CLIFFORD COHEN, Jr. 
T'he University of Georgia 


1. INTRODUCTION AND SUMMARY 


T T rt H 
To calculate maximum-likelihood estimates of the mean and standard deviation of a 


normally distributed population from doubly truncated or from doubly censored samples, 
it is necessary to solve simultaneously a pair of rather complex non-linear estimating 
equations. Although solutions can be approximated by straightforward iterative pro- 


cedures, the calculations often become tedious and time-consuming. This paper is concerned 
With reducing computational labour required in obtaining these solutions. For use in the 
doubly truncated case, a chart which permits direct reading of the standardized terminals 
With an accuracy of from three to five units in the second decimal is included. For use in 
both the doubly truncated and doubly censored cases, an iterative procedure is devised, 
which, for a specified degree of accuracy, appears to require less computational effort than 
similar procedures previously proposed. For use in singly censored cases, where only one 
estimating equation is involved, a chart which permits direct reading of estimates of the 
standardized terminal with a degree of accuracy comparable to that possible in the doubly 


truncated case is presented. Since tables recently published by Cohen & Woodward 
imates from singly truncated normal samples to the 


(1953) reduce the calculation of esti ^ : ied in 
simple task of interpolating between table entries, that case is considere ere only 
by reference. 

ONS FOR DOUBLY TRUNCATED SAMPLES 


2. GRAPHIC SOLUTION OF ESTIMATING EQUATI 


distributed random v: 
-uie 


ations from this population, each of 
ample terminals, zy and 


ariable with probability density 


We let x designate a normally 


function (-0<%<00). (1) 


fla) = (c4 (27) exp [- 
For a doubly truncated sample consisting of n brem mi 
Which is subject to the restriction ay «X vo ts ie : a 

Xa+ w, are fixed, the logarithm of the likelihood function 1s 


S (a. — u) (20) + e nst., (2) 
t= nmg) -169]-»Ine- 2 6 ny (20?) * co 
-lexp-— 3 
bin 1(é) = i P gaat, g(t) = WEDI P i (8) 
Hi 
tot w- MIF (4) 


S (ay — M12: &-( 
search, U.S. Army. 
e Office of Ordnance Re: — 


* 


Sponsored by th 
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à " Es artial 
Maximum-likelihood estimating equations obtained by equating to zer e. the partia 
derivatives of L with respect to x and o respectively were given by Cohen (1950) as 


e[Z, -Z,- &]— L^ and M (5) 
e?*(1- £(Z, - Z, é) - Zyw|o] - v, - 0, 
where Z, = PEDEN I) Za = EE) - IE), (6) 
Ve = > (x,—2)*/n. (7) 
i=1 


No information is assumed to be available about observations which might have been 
eliminated as a consequence of the restrictions on observation of a. 
A procedure based on two-way interpolation was employed by Cohen (1950) to solve 


equations (5) simultaneously for maximum-likelihood estimates, & and @,. With these 
values determined, 7? follows from (4) as 


fi - s - 68, - 
Throughout this paper, the symbol (^) serves to distinguish maximum-likelihood estimates 
from parameters estimated. 

Expressing c as o = w|(E, — E), (9) 


a result which follows from (4), further reduction of estimating equations (5) yields 


[£, — Z. — EI (Es — E) -n/w = 0, 
{i+ &Z, = &,Z,— 4 - ZH. -&- [wu = d 
where s? is the truncated sam 
the form F(£,£) = H,( 
constants. 

The above form for the second e 
who, together with Friedman a 


nd Garelis (1954) 
intervals of 0-5 for the two arguments. Asam 


(10) 


ple variance (= v, — y? 


). The two equations of (10) are thus of 
& &)- K; = 0 (i = 1,2), 


where for a given sample the K; are 
quation of (10) was Suggested by Mr George W. Thomson: 
; tabulated H(E,,£,) and H,(£, £0) 9* 


i i eans of circumventing interpolation difficulties 
imposed by the large tabular intervals and in order to further facilitate the solution of (10): 


the two families of curves, F (£, &) = 0 and FE, E.) = 0, were plotted for selected values 
of v,/w and s?/w2, and are presented here as Fig. 1.* Co-ordinates for points on these curvo? 


were read from large-scale graphs of HE, £;) carefully plotted from the Thomson, Fried- 
man & Garelis (1954) tables as functions of £, for the family of £ values 
The fact that H, and H, obey the relations 5 : 


Hy, &)- IZ; - Z, & (E, — E) = 1-H(—£, =E 


: oli du (11) 
AE, &)- [1+ &Z,- aZ — (4,— Z3)*]I(E, —§) = Fi &, — A 

permitted one-half the points required in plotting Fig. 1 to be obtained by reflexion. Fro?" 

the first of the above relations, it follows that the graph of Hy(E, E NK vi 0isthe reflexio? 

of the graph of Hy(E,£)—(1 — K) = 0 about th “ae r i 


e line f+ = 


i 
0, while from the secon' 
ne. 


: V de 
m Mr i gu. Op in the preparation of this char 
ies of the original 8-figure tables are a; * ined Of 
request from Mr G. W. Thomson, Ethyl Corporation, 1600 West [^ x gees 
Michigan, U.S.A. 
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With »,/w and s?/w? com i 1 
; puted for a given sample, the int i i 
of F (č, £;) = 0 and E(£,. £) = 0 is looted rei ho Wo Wee e ae 
: de So) = 1 . 1. Co-ordi i 
point, which can be read to within three to five units in the ead decimal yes ma - ur 
al, require 


45 RI n i ; l 
40K 1.9 r a 
MS 2s 
3x E 
o 
3-5 
zum 
is 
3-0 F 
25 
20 
& 
15 i i 
1:0 
0-5 : 
i s, 4 
0 Vio D 
Bl 
e| | J 
-05 Lp 
di L LET EE RENE T E T. 
E exo -U 0.5 LE 
-45 —40 35  -30  -25 -20 g 15 0 


ate curve corresponding to sample 
n with curve which corresponds to 


d samples. (1) Loc 
ve, which may be read on 


d to its intersectio 
ection determined abo 
lues of £, and >. 


Fig. 1. Estimation curves for doubly truncate! 

value of »,/w. (2) Follow curve thus locate 
(3) Co-ordinates of i 
of chart, are 


sample value of s?/w?. dienen 
scales along base and left edge the require 
practical applications, but 


is adequate for many j 
as first approximations 


values of £, and £,. This degree of accuracy Eel 

when more precise results are necessary, ues thus rea 

to be improved through iteration. m 
With Ê, and Ê, determined, estimates of the mean à 


(4) 
$ awli- i: 


val 
tandard deviation follow from 


^ 
f= my O81: 
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Although estimating equations (10) were derived by the method of maximum ine 
these same equations also result from equating the first and second truncated sample 
moments to corresponding population moments. In standardized population (complete 
distribution) units, the mean of the population central block (truncated distribution) from 
the population mean may be written as 


ii £. = = 
y ea ee BN n e A(t) dt = Z,- Z», 
z 525-185], ^ Sp Set 


where dj, designates the kth standardized moment of the truncated distribution from the 


population mean in population standard units. The mean of the truncated distribution 
about the left terminus is 


AN 
| 
NI 


2-5 


The foregoing may be made clearer by reference to Fig. 2. 


0 Mean of population & 
central block 


Fig. 2 


The first equation of (10) is then the result obtained by 
of sample mean from Xp to (ii) distance between truncation points, w, to the corresponding 
population ratio, (Z,—Z,—£,)|(E.— £j), otherwise denoted as HH. E E ) 

The second moment of the central block in a standardized Ton EDS 
mean of the whole population is 


equating the ratio of (i) distance 


mal population about the 


_ 1 ff uw 
^7 jg jg) |, SOOM 1S2 cuz, 


The variance, v?, of the central block of the population is therefore 
F =% -7 = 1+% Z-Z,- (Z,Z,y. 

The second equation of (10) then results from e 

(ii) the square of the distance between truncati 


quating the ratio of (i) sample variance n 
tion ratio [1 +42, &Z, (Z, ZINE 


on points, w?, to the corresponding popul™ 


£1, otherwise denoted as HE, E;). 
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3. IvERATIVE SOLUTIONS 


When solutions i i i 

Si sega hp eninge OE gland Squat Dip cate qe it 
un quar trii Meme $82 > quired accuracy, the standard maximum- 
SUS procedure (Newton's method) may be employed to compute corrections to th 
initial values. However, since the second partial derivatives of the likelihood function must be eval ted, 
each cycle of the iterative process is likely to prove tediousand laborious. Furthermore, SEGONS T 5 
of neglecting second and higher powers of the corrections, Newton’s method tends to modes 
slowly converging iterants during the first few cycles unless initial approximations are ina close neigh- 
bourhood of the solution. This difficulty has been recognized and discussed, for example, by icu ed 
(1956). In an effort to overcome these objections, the method of successive substitutions 8s described 


by Searborough (1930, pp. 191—5) has been employed to develop iterants of the form 


E = flr E). $2 = ges ED (13) 
On clearing of fractions, equations (10) become 
[Z, - Zs- &] = (&& - &) Pas 
(eZ - SA - (Zr 22)" = ciens] BS 
Using the identity 
[+42 —&Z,- A. - Ze -(Z,-Z.-&A- Zs) — (£ — &) Zel; 
the two equations of (14) may be combined to give 
(s/w)? E, 8-1 7 22) v,/w+ Ze] (££) —1 = 0. (15) 
which is quadratic in £&-£&- 
On applying the quadratie formula, it follows from (13) that 
(16) 


&-& = (7I 7 Zn w+ Za + vll- 2) v, we + Za} + 48°/w°)} w?/(28°), 
radical since &,-£> 0. The first equation of (14) is solved for 


where the positive sign is taken with the 
& to obtain -& - leer tw —(Z,— Za) — rie). (17) 
With this result substituted into (16), it follows that 

EC Za) v, jw Zo + 4s?/w?)) w(v, — w)/ (2s). (18) 


w+Z_)- 


ecified by (13), they may be used to iterate to the required values 


f= (4, —Zy)+ A - ZA) nl 
, Since (17) and (18) are of the forms sp! 
5, and A by successive substitutions as follows: 


£i Z- 7) + (qe Zi) v, [w+ ZPP  45*/w^) w(v,— ore (19) 


Zeyn w+ B1- VU - 


" k es m | 
EP = pron» rw — CZ — Za — "ale 
Da 5 H "1 i E. H 2. 
M s ic itaran (approximation oF Fig. 1. improved approximations can be obtained by 
5 ns out that these iterants result in a rapid 


With & and £f" det ermined from the ¢ 


the successive application of (19). In many applicat ur the first cycle or 
advance toward the neighbourhood of the solution during t e Gb This behaviour is opposite to that of 
of succeeding iterants sl the solution" i ts is sometimes slow for values very far 

The two methods of 


ows down as t aped 
Newton's method, for which convergence of successiv e wir EP 
mu. from theeglution, Le igi ap e the noT applications where a high degree of accuracy in 
oration ement each other, anc m pres 3e i ] procedure consists of readin; 
the vien a equations is required, an efficient compu dee a remain itlt ds 
initi . - (O) fr eig. 1. advancing to the neigh ; 9 
initial approximations Eo and ££" from Fig. 1. a Serm angle E cle of Newton's maihod. TASHE 
9r more cycles of (19). and obtaining final estimates with a sing sod Tta neously without the 
Newton’s method for the final cycle. estim: 

hich wou 


jons it tur! 
two. Thereafter, convergence 


S -nriances are i 
ates of the varia ; vs : ari 
se be essary in ev ng the second deriva hives 
vi iecessary In valuating 
ld otherwis ber A t 


additional computational effort w! : 
Separately. od d on Taylor series expansions ofestimating equations 
of iteration I$ base designating approximate solutions to the estimating 

g 


. Newton's method. This method 
in the vicinity of the solution. Wit! 
equations, éL/é = 0 and @L/éo = 


h values Fo and {lo 
0, we write 


p= koth € = 


9 
gy E. 20) 
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By Taylor's theorem, neglecting powers of h and k above the first, we have 


eL eL eL 
h— +k —— =-—, 

L — Eftyeoy Co 

en k eL _ ob 
Ous00, G03 20, 
Corrections 4 and k are then obtained as the simultaneous solution of (21). 
For a doubly truncated sample, the coefficients of (21), obtained by 


h 


differentiating (2), are 


aL n Z z 

gs = ilH 0 Zl 

9L 13 = = 

ig = siet Guo - o1 EZ, - EZ, 

eLn F F F F 99 
S67 al AAT, - JL (22) 


en "[?(:—h) 4, 4 z a 5 5 
RT A A gr] 
er n [ 3(5? + (z — u)2) = = a - 
oi --z[ gà O46 782+ 82 ,— az. 

where Z (= 2) +,) has been written for the sample mean. 

Newton’s method has been set up here in terms of and ø rather 
substitutions of (19). Thereby, asymptotic variances of the estimates 
the second partial derivatives from (22), as evaluated for the final cy 


since these values closely approximate corresponding ex 
exact calculation of the variances, 


cle of iteration, This is permissible, 


4. DOUBLY CENSORED sAMPLES 
In this section we are concerned with doubly censored sam 
probability density function (1), such that out of a fixed total of N random observations, 
there are n, for which it is known only that x «a, na for which it is known only that 


T> tw, and n = N—n,— na fully measured observations in the interval Lo X v x To tW. 
For a sample of this type, the logarithm of the likelihood function is 


L-mnln[1— 1(5,)] - nj]nI(£)) - nno — Dax 
I 


ples from a population with 


i— #)?/20? + const. (23) 
This case also was considered by Cohen (1950), and t 


he estimating equations obtained by 
setting 3L/ðu = 0 and aL/ac = 0 may be expressed 


as 


P-P- NNE- 6) — v fu, = 0, 24) 
{1 +N -EY- (Y, — Y) — £2 sus - al 
where Y(&)- E iM zx ret =) (25) 
2 $l) m 
YE) = EE = EZE), 
with Z(£) designating the reciprocal of Mill’s ratio, ie. 
Z(E) = E)E). (26) 


Tables of this ratio and of related functions recently ap 
journal. In these tables, Z(£) is denoted as Z/P and 
standardized normal deviate rather than É. 


peared as an Editorial (1955) in this 
Z(—&) as Z/Q, with X written for the 


= Š - Ü——————' M 
— qu—————————— 
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ee he Lite Cir teammate elas ae ee 

ek "d e 3 cw 1 5 in place of Z, and Z, respectively. 
es $21 pointed ou hat for doubly truncated samples the method of maximum 

e ihood yields the same estimating equations as the method of moments. The same is 
true in the case of censored samples if we suppose that the n, and n, censored sample 
observations have the same first and second moments as the integrated moments for the 
tails of a normal distribution containing proportions 2,/N and n/N of the whole. This 
latter observation was made by Des Raj (1953), who described his method as one of modified 
moments. 

Since Y, and Y, involve not only £, and £j but also 7, n and ng, charts corresponding to 
Fig. 1 do not seem practical in this case. We must accordingly expect to start with less 


accurate initial approximations than in the doubly truncated case, and depend on iteration 


for improving these approximations to the required accuracy. 

When n, and n, constitute only a small proportion of the sample, they may be neglected 
oximations to 2, and £ can be read from Fig. 1 as for a doubly 
are appreciable proportions of the total sample, better 
bles of the normal curve areas using the relations 


to the extent that initial appr 
truncated sample. When n, and na 
first approximations might be read from ta 


$ oo 
T |^ 1) dt, ERE =f A(t) dt. 
Ny tNgtN r. $0 TTA Ju 


(27) 


amples, the two sets of initial approximations suggested above should 
d either set might be satisfactory as a starting point in the 
able differences exist between the two sets of values, the 
iately be based in some measure on the magnitude of 
ze as already mentioned. Under some circumstances 


For reasonably large s 
be in fairly close agreement, an 
iterative process. When appreci 
choice of starting point might appropr 
^ and ng relative to the total sample si 


an average of the two might be preferred. 
ted samples also apply here with slight modifica- 
n Y, and F, are substituted for Z, and Z, 
fficients of (21) are obtained from 
(26), these coefficients are 


trunca 
his case wher 
licable where the coe! 
(25) and Zé) defined by 


Iteration procedures given in §3 for doubly 


rine The iterants of (19) are applicable in tl 
pim ee Likewise Newton's method is apP 
3) by differentiation. With Y, and Ys defined by 


9L" (gu o(Y,- Y 


n c? 
UE [e -a(l sen-6¥0 | 
óc c? a 
ab _ lmnZ(-É) n Z2 | T 
oe g^ 
Ob . 1 [PE ny AC -&) 2»X£2] t 
geo e 
eL 1 nint EA) mend E) ni. 
ao? $ a° 
Where ane) = ZO 5 (29) 
ME) = He) ££ 8» 


ee) = EZE + ABI 


As wr; . i ect to E 
Written above, Z’ is the derivative of Zw ith resp! 
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5. SINGLY CENSORED SAMPLES 


For a singly censored sample in which, out of a fixed total of N observations, there are n 
measured observations for which x > x, and n, = N —n unmeasured observations for which 
it is known only that «<2, the estimating equations as given by Cohen (1950) are 


(1 -&f -&yi(Y 6 — v = o, 
ê = (Y —£) (30) 


These results follow from (24) when we let £,— co and subsequently drop the subscripts 
from £, and Y,. The first equation of (30) can be solved for Ê. Then 8 follows from the second, 
with ĝ being determined from the third as in the doubly restricted cases. 

If, however, (Y—2j is eliminated between the first two equations of (30), we thereby 


obtain 1-4/6) _ v - (31) 
Mê) oc 
and it follows that Pen (32) 
ne ` 
Upon substituting this value for Ê into the third equation of (30), we have* 
í- w= (6? — vj)]v,. (33) 


Thus as an alternate procedure for calculatir 
of (30) for (Y — £) rather than for 
before, and calculate f; from (33). 

Since Y(£) = (n/n) Z(—£) in the notation of equation (26), the Biometrika Editorial 
tables (1955) or tables of normal curve areas and ordinates permit ready evaluation of this 
function, and it has been possible to prepare a chart, included here as Fig. 3, of the family 
of curves 

velvi = [Y.— £(Y — £)]I(Y — £y, 


plotted as functions of £ for the various values of h, where 


ng estimates, we might solve the first equation 
Ë, then calculate ĉ using the second equation of (30) as 


h = n| (n+). (34) 
To the extent of the range covered, tables of Hald (1949) 
curves. For values beyond the range of Hald’s tables, computations were based directly 0n 
tables of the normal curve. In using the chart of Fig. 3, v/v? and h are computed from the 
sample data, and with these values known, Ê is read along the horizontal scale. When more 


accurate results are required, [1 — £(Y — £](Y — E)? can be calculated for additional values 
of £ so that a more accurate determination of Ê or of (¥ —é) can be obtained through 
interpolation. 


The function ( Y — &) has been tabulated by Gupta (1952) 


were employed in plotting these 


as a function of p and yr, where 
p=1-h and y= S?[ys. 
Corresponding to observed values of p and y, (Y — £) 


; Which Gupta denotes as z, is obtained 
by interpolation from his table. Estimates of the po 


pulation standard deviation and mean 


* This result was given in an equivalent form by Gupta (1952). 


A. ur NS Expressed in the notation of this 
paper, his estimator may be written as  — z+ (6? — 82) / (5 — ang) 


=" e- —— ee ee ÀÁ qQM—tátá—' 
Sa m- ee 


5 


M 
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a 
. then calculated from the second equation of (30) and from (33) respectively. Gupta’s 
ates accordingly enjoys the advantage of requiring interpolation in only one table. 
at any rate in parts of his table his tabular intervalsseem too large for easy interpolation . 


TORTE 
"AW HORT 


f 
: 
r Estimation curves for 


singly censored samples | 


d l + 
— 4 — 
it 
E 
fe 
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6. ILLUSTRAT 


To illustrate esti 
e produc 
tems of di 


Do 
Mi truncated. sample. 
o, inia an example in which the entir 
"Eo gauges, with the result that i 


[* 1t 
A Gupta as to be worth considering whether 
8 table, using finer tabular intervals. 


IEWESENWEREN 


-15 
Ü 


È 


Fig. 3 


Ed.] 


IVE EXAMPLES 


+ this disad 


oubly truncated case, we 
pushing is sorted through 
f 0-6015in. and those 


mation in the d 
tion of à certain 
ameter in excess O 
vantage might be overcome by an expansion 
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less than 0-5985 in. are discarded. For a random sample of 7 


5 bushings selected from the 
screened production, 


z = 0-600 149 33in., s? = 0-000000371 187, 2)=0-5985 and w = 0:0030. 
Thus p, = F—*X = 0-001 64933, v,/w = 0-549 78, s? jw? = 0-041 242, 


and visual interpolation between the curves of Fig. 1 gives £ = — 2.50 and £p = 2-00 as 
initial approximations. These values might conceivably be accurate enough for the purpose 
of this sample. However, as a demonstration of the iterative processes described in $3, we 
proceed to determine more accurate solutions of the estimating equations using those 
methods. Accordingly, two cycles of the iterants of (19) yield 


EP = 1997, £9 = 2.522 and £p = 1-997, £t? — 2.595, 
respectively. For final estimates, we employ Newton's method with 


c, = w[(E — £9) = 0-0030/(1-997 + 2-525) = 0-000 663 42 
and Ho = 0:5985 — (0-000 663 42) (— 2-525) = 0-600 175 14. 
With the derivatives of (22) evaluated for these values of Lo 


and o (£, = — 2-525, č, = 1:997). 
corrections h and k are obtained by solving (21) which for 


this sample becomes 
— 150706 5764+ 27235297k = — 7-076 300, 


27 235 297h — 187 700 371k% = 75:019 711. 


Thereby we obtain h = — 0-000 000 03, k = — 0-000000 40, and as final estimates we have 


4i = 0:600 175 14 — 0-000 000 03 = 0-600 175 Ii, 
& = 0-000 663 42 — 0-000 000 40 = 0-000 663 02. 

Since coefficients of the correction equations in % and k are approximately equal to 
expected values of corresponding second partial derivatives of L, the asymptotic variance- 
covariance matrix of Ê and & may, with very little additional effort, be approximated as 


( 150706576 —27235 "dn = ( 5:471 x 10-9 


—27235297 187700371 —98-87 x10-9 


—98:87 x -M 
Thus, 


6:814 x 10-9 


V (ft) ~ 0000000005471, V(é)~0-000000006814 and Cov (72, &) ~ — 0-000 000 098 87. 


val time in days and is assumed to be normally 
distributed (y, v), 


^,—2, n=40, n,— 5, 


% = 1-301030, w= 0-602 060, z= 1.620111, 
v, = 0:319081, s? = 0-021739 2, ww = 0:529 989 and s*?/w? = 0-059974 1. 

Neglecting knowledge of n, and ng, we read é — — 1-71 and & = 1-36 from Fig. 1. Sub- 
sequently we apply (27) to obtain & =—1-72 and £2 = 1-25 from tables of normal curve 
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) 
, 
} 
a areas, TI ri i i 
Hon cias o = of rji are in reasonably close agreement, and we decide to begin 
2d with initial values of & = — 1-72, & = 1-28. The i Z 
2, EY = 1-28. The iterants of (19) with Z, a 
,and 


Z» repl xit s : 
Placed by Y, and Y, respectively are employed to obtain the results tabulated below. 


í E go | 


LN 


we start with the results obtained 


As a : ; 
"s s demonstration of Newton's method in this case, 
at the end of the second cycle, that is, with £P = — 1-689 and 2? = 1-282. Accordingly 


p Ta = 0-602 060/(1-282 + 1:689) = 0-202 646 
and 
[lg = 1301 030 — (0-202 646) (— 1-689) = 1-643 299. 


With derivat: 
a derivatives of (28) calculated for these values of Zo and c, equations (21) become 


52.976 746k = 0-009 567 35, 


= — 0-234 567 59. 


— 1117-362 555h + 


52-976 746h + 1791-254 507k 
On ; 
ie. lution, we obtain h = — 0:000 00236 and Æ = 0000 130 89, and as final estimates we 
d 


fi = 1-648 299 — 0-000 002 = 1:643 297, 


& = 0-202 646 + 0:000 131 = 0:202 777, 


1-281 
e with those o 


Whi 
: hich correspond to £ 168790 and Ê = 22, It is to be noted that when rounded 
to thr bu en RU Y d e btained using the iterant 
of ( 19), Wee decimals, these two latter values agre 8 anes 
the asymptotic variance-covariance 


Usi 
n " . : 
€ coefficients of the above correction equations, 


/ Matrix ^ 

s of @ and & may be approximated as 

imps eamus, f TO 7 des sind ; i 
| T} — 52-9767 1791-2545 = | _ 0-000 026 5 0 4 
Jus We have 

1 E 2 &) ~ — 0-000 026 5. 

1 V(g)..-o005501, V(8).00008902 amd 207 gano 

^ gli, : ; son from 8 singly censored sample. we 

Cong: Z Censored sample. To illustrate estimation y, = 10-654, 7, = 145-2426. Thus, 


ider on = 

= e for s x, = 70.00, n — 50, t= š ; E T 
ju fing! 8+ 50) = Reverses v, = 1.27958. Reading fror + que n uar : " 
. "PProximation. ^ this computing routine is m =f Y yq = 2 osa " 4 
adily be improved by calculating additional values of [ 5 5 g 


can te 
s ing as summarized below. 
of normal Sirro arcasuand ordinates, and then interpolating ow 
a 


table 


- 
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; " = 
When the Biometrika Editorial tables (1955) are available. the required values of Z(— £) 


can be read directly without the necessity of the additional calculations involved in using 
normal curve tables. 


g Y-=§ n -&rY -&y =E 
— 1:560 1-67939 | 
— 1-569 1-68899 | 
| — 1-570 1-68990 


With = — 1-569 and (Y-8) = 168899, we estimate o and y from the second and third 
equations of (30) as 
& = 10-654/1-68899 = 6-308, 


Ê = 10-00— ( — 1-569) (6-308) = 79-90. 


Alternately, we might have estimated / from (33) to obtain the same value as above 


fi = 10-00 — (6-308? — 145-2426)/10-654 = 79-90. 
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UPPER PERCENTAGE POINTS OF THE GENERALIZED 
BETA DISTRIBUTION. I 


By F. G. FOSTER axp D. H. REES 


Research Techniques Unit, London School of Economics 


and Rothamsted Experimental Station, Harpenden 


1. INTRODUCTION 
if A and B are independent estimates, 


Ins : : ; 
sampling from /-variate Normal populations, 
a null hypothesis is the same dispersion 


bas 

ased on r, and r, degrees of freedom, of what on 

matrix RI? à : 
atrix, the roots of the determinantal equation 


|B-AA|=0 


are invariant under all linear transformations of the k variates. They are thus unaffected, 
me example, by a change in the unit of measurement, and so are independent of the magni- 
tudes of the parent vanana and covariances. There are clearly advantages in testing the 
null hypothesis by means of some function of the roots, and arguments for their use have 

een advanced (Hotelling, 1947, 1951, 1954; Roy, 1945, 1950, 1953, 1954). Among the 
functions proposed* are: (i) the product of the roots (Wilks, 1932; Pearson & Wilks, 1933); 
(ii) the sum of the roots (Hotelling. 1947, 1951); (iii) the greatest (or least) root (Roy, 1950, 
1953, 1954). We are concerned here with the tabulation of the greatest root, for the case, 
k=2 The computations are based on recursion formula of Roy (1945). Similar tabulations 


for 
E cases [: = 3, 4 and 5 are in hand. M 
f A is based on v, and B on v, degrees of freedom, the k roots 0 
|n4B - 0(, A ve) jut 
mm between 0 and 1 and are related to the roots A of the previous equation by 


VA 
i DES: . 
inable from the roots 0 and vice versa. The relation 
n he F and the Beta distributions in the univariate 
0, ints of the distribution of the greatest 
% pol 3 ; 
dispersion matrices. The tables extend 


Thus 

anti roots A are immediately ob 

Cage ae to that which exists between t 

root à € have tabulated the 80, 85, 90, 95 and 9n Á 

preyi 9n the null hypothesis of identical paron 
ious tables by Pillai (1956) and Nanda (1951). 


i ; Fisher (1939), Hsu 
he joint distri s 8, has been given by Fisher ( " 
Anara a Hue sp 5 Mood, 1951). The density function is 


(1939 
9), Roy (1939) and Girshick (1939) (cf. %15 
j 2 a Sal), 
K ioraa 07 Gi ® pef 3 
i=1 id eS 
Hire) e 
“here ch aes a) 
Kem uu rd 2p + Treat- 2 
and cbe) pedi 9t 
p= 302— — i T in different circumstances does not appear 
its of these functior ssible approach will be to earry out 


se. A po: : ; 
here rious alternatives. 


ions of the tests against và 


* 
Th " 

to h ve on as to the relative mer 

een fully resolved, and will n 


istrip ot be taken uP 
uti f r 
non sampling experiments to obtain 
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Thus the distribution function of the greatest root is given by 


$ L7 
Lk; 7,9) = xÍ dof “dn "i 
0 0 


0 


Oz k 
dO, TI 023 (1 — 0:) TI (0; —9;). 
i=1 isj 
It will be seen that I,(/; p,q) reduces to the Beta distribution I,(p,q) when k = 1, and for 
this reason we have proposed the name ‘Generalized Beta distribution’ for it. 


2. INTERPOLATION 


For use of the tables in connexion with tests of significance, the value of x will be required 
for the percentage levels tabulated and for integer values of v, = 2p +1, v, = 2q4-- 1. Linear 
interpolation, either p-wise or q-wise, within the body of the tables will usually suffice for 
accuracy to at least two decimal places. For greater accuracy 3- or 4-point Lagrangian 
interpolation should be used. Extrapolation beyond v, = 21 (p = 10) is not possible, and 
the necessity for this is not expected to arise. For q-wise interpolation between v», = 161 
(q = 80) and v, = co, 3-point harmonic interpolation, based on q = 40, q = 80, q = oo may 
be used. 


Example. Find the 95 % point corresponding to v, = 13 (p = 6) and v, = 403 (g = 201): 


| "n | q 1/q w Ax Ax 
| " 
co oo 0 0 m 
161 80 1/80 0-1506 1506 


— 304 
81 40 1/40 | 0-2708 1202 


The values, 1/q, are equally spaced, with interval 1 


i /80, and we require the value of æ corre 
sponding to 1/q = 1/201. Thus 


1/201, — (1/201) (1/201 — 1/80 
1/80 “0+ 2(1/80)2 Atay 


80 80 x 121 
~ Bor 01500) 37 agre (00304) 


= 0-064, 


which is correct to 3 decimal places. 


a=0+ 


3. USES OF THE TABLES 
We illustrate two typical applications to tests of si 
the analysis of dispersion (i) of means and (ii) of 
critical region is the upper tail of the distribution (cf. Rao, 1952, chapter 7). 

It should be noted that a direct application to testing the equality of two dispersi 
matrices, when we do not know which should be the ‘larger’, would require the joi 
distribution of the greatest and least roots. This would correspond to the univariate tes 
in which we use both tails of the F (or Beta) distribution as critical region. Since the greatest 


and least roots are not independently distributed, these tables are not appropriate for this 
type of test. 


gnificance discussed by Roy (1945, u€— 
regression. In both cases the appropri? 


on 
nt 


-—— 


T d 
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(i) Analysis of dispersion of means 
.., n, from r bivariate Normal populations, having 
the same dispersion matrix. It is required to test the hypothesis that the r mean vectors 


of the populations are equal. Let "nac .-- +n, = N. The combined sums of products 
matrix (S) based on N — 1 degrees of freedom, may be analysed into sums of products due 
to the means (Q) with r— 1 degrees of freedom and sums of products due to the error 


(W) with N — r degrees of freedom: 


Let us select samples of sizes n4, ne - 


S=Q+W 


egrees of freedom, are independent estimates of the 
s in computing the greatest root of 
y, 2 r—1. 

wo characters x, = log, (tooth 
cond premolar for the 
ang. The groups have 
the authors (Ashton, 
tabilize the within- 


Then Q and W, divided by their d 
same parent dispersion matrix, and our test consist: 
|Q~as | = 0, and entering the table with r, = N-r, 
Example. Table 1 gives the analysis of dispersion for the t 
length) and x, = logy) (maximum breadth) of the permanent upper 8e 
three male groups, human (West African), chimpanzee and orang-out 
been selected froma larger body of data kindly made available to us by 
Healy & Lipton, 1957). The logarithmic transformation was used to s 
Stoup variance. Then 
-68272 s 0 
PT DI 
0-525768 0:509 


Table 1 
| Sums of products matrix 
Degrees of 
freedom | E sm P 
oi E 9 
fa 5 2 075 
.544941 0-525708 0-50) 
Between groups 2 Qu Me : Leave a 
ri ee 7 013718 
Within groups 154 Wa 
à -682727 0-595110 0-601867 
Total 156 Su 0-68 
: :ng the quadratic 
The roots of [Q—68| = 0 are now obtained by solving the q 
6 0-601867) 


(0-544941 0 0-682727) (0-509075 — 0-525168 — 0 0-595110) = 0. 


- (0-525168 00.595110) 


— 0-856543. 
We find that p, = 0020288, 2 = expe 
: ioni t. since from the 
With + to be highly significan n 
De aD em — 0-857 is seen Ene for v, = 2, v, = 161 it 
tables à se E riis p pec for Vee ee 121 is 0-096, and for v i 
is 0.975 e that the 99 95 P 2 


T 
i group means are given in Table 2. 
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For the sake of comparison we illustrate the alternative test criterion based on the 


product of the roots, using these data. The statistic A = | W |/| S | is equal to the product 


il (1— 0,). This criterion is arrived at by the likelihood ratio method by Pearson & Wilks 
we v 
i=1 


(1933), and for k = 2, /A has exactly the Beta distribution 7, (v, — 1, və) on the null aee 
Applied to the above data, this gives A = 0-140554. Therefore VA = 0-375, and on the vat 
hypothesis that neither t, nor x, differs from group to group, A should be disteibubee 
according to [,(153, 2). The lower tail of the distribution being here the critical region, this 
is again highly significant, since Pr (VA < 0-375) = 0-65 x 1079, (Another example is given 
by Pearson & Wilks, 1933, p. 370.) 


Table 2 


| No. in group | T 


vl Ta 
Human (West African) 59 1-846 1-981 
Chimpanzee 55 | 1-865 2-008 
Orang-outang 43 | 1-986 2-119 


(ii) Analysis of dispersion of regression, 
Let 2,, £o, si Uer Usti, Toyo be (s+ 2) correlated var 
produets matrix (xij), based on v degrees of fr 
hypothesis that the linear regression of x 


iables for which a combined sums-of- 
eedom, is obtained. It is required to test the 
sr: 9,9 ON the other s variables, 


s 
E (2.44) = baty 


8 
E (540) = Š biata 
i= 


has zero regression coefficien 
due to the other variables. 


The sums of products matrix for a, 


ts, Le. that the variation in sits 2, is independent of that 


2043/0318 
(? T 2 
Wu 6. X. 
qx! 2, — xls 
Let the inverse be : 
el, ese 


Then the sums-of-products matrix for 7,14, 2,,9 due to és 


734, ls 
11 1s 
à 1, 
f= [nee ** asp. PUO “lst 
Tisto o Y. gio e : i 
» - = ‘SS 
d Re A Vs sta Vs sro 


| 
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On the n 
i " P : 
ieee il p sie Q yields an estimate, with s degrees of freedom, of the di: i 
sti Ts+2- The sum of products matrix for 2,1, x, , due to the eios after ena ug 
oni ^ rrecting 


W- — m -Q 


X49,s21 Us42,942 


fates : 
Ty... d, is now 


This yj 

k yields an estimate b 

"Wow ate based on v — s degrees of freedom, which, on t i 
dependent of the previous one. Therefore as 


S = — Seen 
(ET SB sem 
and om St+2,st1 S+2,8+2) 
T y : š 
test consists in computing Omax, for 
- [Q—-08| = 0, 
enteri : 
Be LN the table with r, = v—s, ry = S$. 
. ample, " Sie s; 
incisor z ; The following five measurements were made on the permanent upper first 
of 83 female gorillas. 
y, = basal width, 
Yo = maximum width, 


= basal thickness, 


Ys 
y, = maximum height, y 
jy; = thickness at maximum width. 


TI H 
nmi da cs i : A 
Por ihe ta is again taken from the same source acknowledged in the previous example. 
ereason given there the logarithmic transforms of these measurements, v; = logy) Yi 
atrix (divided by 82) obtained: 


Wen 
e tak 
e : 
n and the following sums-of-products nt 
E 


v, Yo V3 En 5 
13:03 
5-77 12:36 
103| 490 — 833 — 188 
3-83 3914 . 2838 229-36 
I —p95 —44:75 — 30-95 —261:52 38831 
baie pairs i 
"équired to test the significance of the linear regressions: 
E (a4) = bit t boy 2g T basats: 
H bog Ta + Dass 


The in é (xs) = Dis? FA 
Vers y ` nd 
€ of the sums-of-products matrix for ti, %2 and aj 15 


0-098298 "t 

104 | —0-035196 0-165996 d =A. 
_ 0.015965  —0-101876 + 0-162152 
given by 


he 
Sum, is m ¥ 
S-of-products matrix due to the regressions 1S 
$83 — 195 146-01 
72] = 1074 = 
[zs 2 ie em Ev -— 9. 


10-s p 
| 9s 39-14 ipea: 
28-38  —3098 


=i 
95 44.75 30-95 
Biom. 44 
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The total sums-of-products matrix before the regression is 


=i ^ á 2D.F. 
iS = 10 [_ 2 — with 82D.F 


The greatest latent root of |Q— 0S | = 0 is Omax, = 0-637. With v, = 3 and v, = 79 this is 
seen to be highly significant, since from the tables we see that the 99% point for vy = 3 
and v, = 71 is 0-186, and for v, = 3, v, = 81 it is 0-165. 


4. METHOD OF COMPUTATION 


The computations were carried out on the N.R.D.C. Elliott 401 Computer at Rothamsted 
Experimental Station. We restrict consideration below to the case i = 2 for which the 
tabulations have been made. Let Omay, denote the greatest root of 


|v4B —0(, A +_B)| = 0, 


where A and B are independent estimates, based on v, and v, degrees of freedom, of à parent 
dispersion matrix of a bivariate Normal population. Define 


L(29;:p.q)— Pr {0 max. < vj, 
where p = 4(v; — 1), q = 1(v, — 1). Pillai (1956) gives a formula which may be written 
L2; p,q) = K(2B,(2p,2q) — a^(1 — x)! Bp, q)], 


where d mi (pg) P(p-q3) 


~ T(p)F(p*-3)T(g) '(q-3)' 


and B,(p,q) is the Incomplete Beta function. As it stands, this formula is not suitable fo" 
computation. If, however, we distribute the normalizing constant, K, we obtain the 
formula i 

I2; p.q)- L(2p, 29)— L(p, q) a.p, q), 


where J,(p,) is the Beta distribution, and 


asp) E TU p) 


PG 3) r(q--3) 
L(p.q) may be computed recursively for integral values of p and q by means of the relation? 
F,(0,q) = 1 (q = 0,1,2, ...), 
L(p, 0-0 (p» 0), 


Lp, q) = «I, (p— lg) (1—2)I(p,q— 1) (p,q» 0), 


and a,( p,q) may be computed by means of 


a,(0, 0) = 1, 
a, (p, 0) = za, (p — 1,0) (p» 0), 
&,(0, q) = (1—2)a,(0, q — 1) (q » 0). 
li 
ada) = [1g p-o 0—2p4-10). (p.229) 


ets 


IN 


p = s 


7> 
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By means of these relati 

Phe jap se relations, we computed t: istri 
these iem rei ene = pelis aen pz ny han 20 D 263 0, 30. 4 copy o 
points given ene . : — of the Royal Statistical Society, T eee 
en en = E obtained, in the main, by inverse edd um amma 
he eii. k calculating machine. The values have an error of | we cicer 
nal place, and in general* this error is less than 0:5 ern leh 
E e computations 


were mad = 
Since f k tienen obtain the values corresponding to p 0-5. 
S or fixed x a 1 > > Vi i 
und p, Z(p. q) 5 1 as q>, we have the approximation for small 
á a. P, 


I2; p,q) = 1— t(P:0)- 


1 empirically that this formula was applicable 


Withi 
hin t j ^ 
he range of p and q used, we found 
The computation was switched to the approxi- 


When " 
boi ri s q ) RAS greater than about 0-92. 
It will he ee i became applicable, which substantially increased the speed. 
à "cri — ee thie above relations that the natural way to compute 7,(2; p q) would 
u "Por ca ge ofa for each p and q, asis required for tables of the distribution Fonit 

distribution E recursively through for p and q. To have obtained a final tabulation ithe 

ofabout aa cha by this method we should have had to store the information onal T 

tion of a "i numbers. This was beyond the internal capacity of the machine, and ee 

that the i 1eme to output the information for subsequent editing on the machine showed 
ethod would have seriously reduced the advantage of the more direct method of 


"oputation. 

h 

rti were therefore prep 
Was to er that the same values of Z(. 4 
tions could rs offset by the gain in flexibility of 
Possible to e restarted at any values of z, P: 1 in case of any 
accuracy ids ei the programme easily at a later stage in order 

Y of the percentage points obtained by inverse interpolation in the tables. 

decimal places directly on the typewriter output in à 
ting of the distribution for one value of q, arranged in 
and with v = 0-01 (0-01) 0-99 in groups of five. 
ntrols were included in the programme. 
for increasing p Was used to avoid 
005, nothing more was then printed 
0-01. Whenever 1, (2; 10,9) 
t value of g and 


of a fixed q for all p and then all x. The 
) and a; (P. q) have to be repeatedly computed 
the programme in that (i) the calcula- 
machine failure and (ii) it was 
to check directly the 


ared on the basis 


he 
Morel ee were printed to four 
able format: each page consis 
1(1) 10 
the following cot 
decreasing 
less than 0-00 
a increased by 
-started with the nex 


ie Col 
"Pen corresponding to p — 
le fact ra on the output time, 
Tinting eae: 1,(2; p,q) is monotonically 
Pn that line os as soon as Z,(2; p, q) became 
, Rg RA a fresh calculation started with 
79:0]. er than 0-99995 the calculation was re 

5. FURTHER WORK 

nputation, we have decided to forego the 
ojected computations for k = 3, 4 and 5, 
automatic computation, utilizing the 
terpolation. 


le 
ned with this cor 


In 
the}; 
tap.” Us! 
{*bulations of the experience gai 
O0 9 ug the distribution functio! 
Ural ordes the percentage points directly: < ; 
of computation and using od of inverse in 
rof. S. N. Roy: for whose advice the authors 


Th 
Wig SNO tabl 
h es were made in consultation with P 


ex 
Press their thanks. 
* The largest erro 


i 20. 
r is at the 99% points for q< 
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Generalized Beta distribution: 100P % points for x 


Pa ae l 
5 7 9 1 | 18 | 15 | 17 | ?9 21 


P 


5 0-80 0-7011 | 0:7728 | 0-8454 0-8825 | 0-9052 | 0-9205 | 0-9315 | 0-9398 | 0-9464 | 0:9516 0-9559 


To -1449 8075 -8696 -9013| -9205| -9334| -9427| :9497 9552 | -9596| -9632 
E. 1950 8463 -8968| -9221| -9374| -9476 | -9550| -9605 -9649 | +9683 | -9712 
“95 “8943-9296 -9471| -9576| -9645 -9696 | -9733 -9763 +9787 | +9806 
-99. .9377 -9542| -9698| -9774  -9819| -9850 -9872 | -9888 | -9900 -9910 | -9918 


0-8734 | 0:8876 | 0-8989 | 0-9082 | 0-9158 
.8889 | -9014 | -9114| -9196 | -9264 
3| -s474| -8741| -8928| -9066| -9173 | -9257| -9326 -9383 
4| -8839| -9045| 9189| 9205) -9376 | -9440 -9493 | -9536 
3 


j| 0:7954 | 0-8303 | 0-8550 


7 0-80 0:5638 0-6469 | 
-8194 | +8507 | -8726 | 


‘85| -6085 -6851| -7 
90| -6628| -7307 -80 
| *5, -7370| -7919| -85 
| ‘99-8498 -8826  -91 


e 
I 
~ 
we 


| 
-9358 | -9475 | 9556 9615 | -9660 | -9695 | 9724 | -9748 
0-7933 | 0-8173 | 0:8363 | 0-8517 | 0-8644 
.7848 | -8136 | :8354 +8527 | -8666 | -8782| -8878 
8120 | -8374 | :8567 | 8720 | 8842| -8943 | -9027 
8696 | :8853 | 8976 -9076 | +9157 | +9225 


8487 | 786: 6 5 
-9051 | +9185 | .9286 | 9364 | 9427 +9478 


9 0-80 0-4688 (07619 


| 85. -5108 
“90 | -5632 

| '95| -6383 | 
99, .7635 


7 | 0:7884 | 0-8009 | 0-8225 | 0-8357 


11 0.80 5| anang | 0:765 
0-1003 | 0-48 .6536 | 0.7016 | 0-7376 | 0763 9 | 0-822 

‘85 -4389 pco pom "DIU ngo | -854| 8008  -8285| ESTR | -8500 
90-4880 | -6551| -7138| * ‘7354 | -8089| -8278| 8433 j 8561 “8670 
95 | -5603 «7091 -7600  :7932 .8212  :8413 | :8573 “8702 pd NS 

| E | ‘s007 | -8790| -8029 | -0039 | :9128 | -9202 | -9265 


| ‘99 | -6878 
13 


-7989 :8357 | 


> 


0:7443 | 0-7654 +7832 | 0:7984 
7632 | -7829| 7996 8138 
7862 | :8042 | :8194 8324 
+8181 | -8337 | -8468 | -8580 


9-80 | 0-3490 | 0-423: 
'85) -3843| - 
90-4298 -5016 


"95 x x are | 
a9 | eec Mee 8706| -S821| -8915 | -8997 
ji233 — 6770 | 
15 0.39 | | | 0:7042 | 07271 077467 0/1636 
| -85 | Paes | T-3809 | PTS | -0664 (17936 | -7453 | 7638| +7798 
| ou] 2448| -A124 d s 2| 7475 | -7675| «847 | -7993 
30| 3537 4527 5408| -6000| "PD | 7598 | TM | 089 | -8138 | 8268 
"95. +4478 | -518 MO | esa | «011, 7338] 77275 | -8378| -8512 | 28629 | :8726 
£ -5130 | -6003 | um 7788 8013 | +8216 8378 | :8512 862 72 


` '99 | 5687, -6237| 6954 
9 0-7312 


17 gn) | 
s | 0:2774 | 0:3447 0-4412 “7478 
| ‘on | 8072 | .3741 | -4690 -7681 
pr | 78463 | -4122| -5043 ; -7969 
3B 4065 -4697 -5564 50 | -6605 | -8458 
| 99| .5299| -5773| -6512| -7008 7373 | | iis ies 
x: | | | ons | goa | 0-6344 | 0:6597 | 06817 | 07 
| 0-80 0-25 asg 5273 | 0:5697 0:604 4 SEA | | goos. 7 
-2515 0:3148 | 0-4073 | 0:4748 0-521 .6257 6541 | 6785 6995 7179 
d| | we Bi 19 Ob Eh m "moon" 
| P | +2791 | -3424| -4338| 4999 "5509 | 59 6 517 6786 | -7016| “7214, “7387 
| 90| .3155| .g;g2 | -4677| -5315| -5805 | bit Sga | -7139| -7349| -7528 | -7684 
'95| 3719 | -4327 192 | -5782| -6238 | 6509 "05. | +7756 | 7926| “8071 | -8198 
-99 9| -4327| -5182 à 2014 .7313 | 7199 | 
9| «5| eo ene] dem T 301 0-6529 0:6729 
21 | | | 5738 0.6040 0-6: -652 -67 
0-80 | 59 | 0:5383 | 0-973 .eoT | .8900 
0-2300 0.2895 0:3782 | 0:4439 0-4959 | Yang "n | -6237| -6489| :6707 59 
85 | 9557 ku ge ese] wien #8002 | po | .g482 | -6721| 6929| -7112 
| oe 5479 | -5875| 'ogg1| .ggas| 7058| -7243 7415 
j | | .1652 | -7810| -7946 


-90 7 
‘on | 72897 | -3493| -4358 | 


95 6277 | wae 
aol 2427] aor | -4847 | Boor | asij 7400) 9. 
74479 | -5014 | -5762 | | 
| =P, where p=}: 1): q=- 1). 


x)= Lp. 0 


0-6262 
-6441 
-6663 
-6986 
+7558 


0-6015 


-6193 
-6415 


0:5572 
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Generalized Beta distribution (cont.) 
ty UR 2 18 5 7 | 9 11 13 15 17 
I | | | 
P | | | | 
23 0-80 | 0-2119 | 0-2680 | 0-3528 | 0-4167 | 0-4678 | 0-5101 | 0-5457 | 0:5763 | 0-602 
-85| -2359| -2924| -3769| -4401| -4903| -5315 -5661) -5958| -6215 
:90| -2677| -3244| -4080| -4699| -5185| -5584 -5918| -6202| -6448 | 
-95| -3177| -3737| -4551| -5143| -5000  -5981| -6294| -6558| -6787 | 
-99 | -4179| -4701| -5443| -5971| -6380| -6703| -6970| -7197| -7391 | 
| | | | 
25 0-80 | 0-1964 | 0-2494 | 0-3306 | 0-3926 | 0-4428 | 0-4846 | 0-5202 
| -85| -2189| -2725| -3536| -4151| -4645| -5055| -5402 | 
:90| -2488| -3027| -3834| -4439| -4921' -5319| -5655 | 
-95 | -2960| -3498| -4287| -4872| -5333| -5710| -6027 
-99| -3915| -4424| -5155| -5685| -6096 -6429 -6708 
27 0-80 | 0-1830 | 0-2333 | 0-3110 | 0-3711 | 0-4202 | 0-4615 | 0-4968 | 0:5275 | 0-5546 
*85| 2042| -2551| -3331| -3928| -4413| -4818| -5165| -5465| -5728 
90 | .2324| -2839| -3616| -4206| -4682| -5077| -5413| -5704| -5958 
:95| +2771 | -3286| -4052| -4626| -5084| -5462| -5781| -6056| -6296 
99 | -3682| -4176| -4895| -5422| -5837| -6175| -6458| -6700| -6909 
29 0-80 | 0-1713 | 0-2191 | 0-2936 | 0-3518 | 0-3998 | 0-4404 | 0-4754 | 0-5060 | 0-5331 
:85| -1913| -2398| -3147| -3727| -4202| -4603| -4947| -5247| -5512 
90 | -2180| -2672| -3420| -3996| -4463| -4855| -5191| -5482| -5738 
:95| 2604| .3099| -3840| -4404| -4855| -5232| -5553| -5830| -6074 
:99| -3475| -3954| -4659| -5182| -5597| -5938| -6225| -6472| -6687 
31 0-80 | 0-1611 | 0-2065 | 0-2780 | 0-3344 | 0-3812 | 0-4211 | 0-4557 | 0-4861 | 0-5131 
“85| +1800 | :2262| -2982| -3546| -4010| -4405| -4746| -5045| -5309 
:90| -2053| -2523| -3246| -3805| -4264| -4651| -4985| -5277| -5534 
‘95| -2457| -2931| -3650| -4200| -4647| -5022| -5342| -5620| -5866 
:99| -3290| :3754| -4444| -4961| -5374| -5717| -6008| -6258| -6477 
| 
33 0-80 | 0-1519 | 0-1953 | 0-2639 | 0-3186 | 0-3643 | 0-4034 | 0-4375 | 0-4677 | 0-4946 
‘85| 1699 | -2141 | -2834| -3381| -3835| -4223| -4560| -4857| -5121 
:90| -1940| -2389| -3087| -3632| -4081| -4464| -4794| -5084| -5342 
:95 | +2324) -2781| -3478 | -4018 | -4455| -4825| -5145| -5424| -5671 
:99| 3123| -3573| -4247| -4757| -5168| -5511| -5803| -6057| -6279 
35 0-80 | 0-1438 | 0-1852 | 0:2512 | 0-3042 | 0-3487 | 0-3871 | 0-4208 | 0-4506 | 0-4773 
‘85| -1609| -2032| -2699| -3230| -3674| -4055| -4388| -4682| -4945 
:90| -1838| -2270| -2943| -3473| -3913| -4290| -4617| -4906| -5163 
‘95| +2206  -2645 | :3320| -3845| 4277  -4644| -4962| -5240| -5487 
-99| -2972| -3408| -4066| -4569| -4974| -5318| -5612| -5867| -6091 
37 0-80 | 0-1365 | 0-1762 | 0-2397 | 0-2910 | 0-3344 | 0-3721 | 0-4052 | 0-4347 0-4612 
‘85| -1528| -1933| -2577| -3092| -3526| -3900| -4228| -4519| .4781 
-90| :1747| -2161| -2812 | -3328| -3759| -4129| -4453| -4739| -4995 
-95| -2098| -2521| :3175| -3689| -4113| -4475 | -4791| -5068| -5315 
| -99| +2884] -3257| -3900| -4394| -4797| -5138| -5430| -5688| -5915 
| | | | 
39 0-80 | 0-1299 | 0-1679 | 0-2292 | 0-2790 | 0-3213 | 0-3582 | 0-3907 | 0-4198 | 0-4460 
-88| -1455, -1844| -2465 2965 -3389| -3756| -4079| -4367  .4627 
-90 | -1664| -2062| -2691 :3194| -3616| -3980 -4299| -4583| -4837 
| -95| -2001|  -2408 -3044| -3544| -3961| -4319| -4631 | -4906 | -5152 
-99| -2709| :3119| -3747| -4232| -4631| -4969| -5260| -5517| -5747 


0-6469 
-6640 
-6853 
«7160 
-7705 


0-6226 
+6397 
-6610 
-6920 
-7474 


0-6000 
-6170 
-6383 
-6693 
1254 


0-5789 
+5958 
+6170 
-6480 
+7043 


0-5591 
-5759 
-5969 
-6278 
-6843 


0-5406 
-5572 
-5181 
-6088 
-6653 


0:5233 
-5396 
-5603 
-5907 
-6471 


0-5070 
+5231 
.5434 
+5137 
-6299 


0-4916 
-5076 
-52117 
-5576 | 
.6134 


This table gives the values of x for which Pr (Omax. m) 2 1,(2; p. q) 2 P, where p-àv,—1). q=h"i— D. 
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Va | 
3 2 3 5 7 9 n 13 15 17 19 21 
Ws a | | 
% | | | 
| | 
P | | | 
41 o , j d "S 
1 0-80 | 0-1239 | 0-1604 | 0:2195 | 0-2679 | 0-3091 0:3452 | 0-3772 | 0-4059 | 0-4319 | 0-4555 | 0-4771 
‘85| 41388 1762| -2 .3262 | -3622| -3940| 4225| 4482| 4715| -4928 
‘90| -1589 -1972 | -3483 | -3840| -4155| -4436| -4689| 4918| :5127 
-95| -1912| -2306 3819 | -4172| -4480| -4754| -4999| 5221| :5423 
“99 | -2594 -2993 -4472 | -4812 5100 | -5358| -5585| +5792 -5977 
si 0-80 | 0-1006 0:1311 | 0-1814 .2598 | 0-2923 | 0:3216 | 0-3482 | 0:3725 | 0-3950 | 0-4157 
‘85| 1129! -1443| -1955 .2748 | -3073| -3366| -3631| -3874| 4097| -4303 
“90 “1295 1619 | -2142 +2941 | :3267 3559 | -3823| -4063| -4284 4487 
95} -1565| +1900 | 2434 13939 | -3564| -3853| -4114| -4380 | -4506| -4765 
:99 | -2140| -2486| -3029 .3828 | -4144| -4424| -4675| -4900| 5106] :5295 
| | 
61 0-80 | 0-0846 0-1109 | 0-1545 2240 | 0-2534 | 0-2801 | 0-3047 | 0-3274 | 0-3485 | 0:3682 
‘85| -0951  -1221, +1667 2372 | -2668| -2936| -3182| -3409| -3620| -3816 
:90| -1093  -1373| -1830 2544| -2842| -3111| :3357| 3583| -3792 +3987 
95| -1324| -1615| -2086 2810| -3109| -3378| -3622| -3847 | -4054 4246 
:99| -1821| -2126| -2610 .3341| -3637| -3903] -4142| -4361) -4561 +4743 
71 0-80 | 0-0730 | 0-0960 | 0-1345 0-1675 | 0-1969 | 0-2236 | 0-2481 | 0-2708 bane eee ones 
-85| .og99| 1o59] -1454| 1788| :2087| -2857 .3604| +2832] :3043| > . 
:90| -0946 ami] 1597 .1939 | -2242| -2514 | :2762 -2991 | +3203) 3401 (s 
05] -1148| -1404| -1824| -2175| -2481 | 2756 3006 | -3236 iem 24 Re 
-99| .1584  -1855| -2293| -2051| 2903 .3939 | -3488| :3718 | :39 
81 o. um X 7 -2 0:2226 | 0:2437 0:2634 | 0-2819 | 0:2994 
Sa 0:0643 | 0-0847 | 0-1191 | 0:1489 | 01578 02000 | sas | -2550| 2748| -2934| -3109 
a] 0728) abe | 1789 -1692 | :1803 | SS :zasa | :2097 | -3890| :3082| -3257 
E -0833 | -1052| -1417| 71727 | p dub 2707 | 2922| -3122 .3308 | -3483 
5| -1013| -1242| -1620| :1939 pol 2919 3153| -3368| :3567 -3751 | :3924 
99| .1402| -1646| -2044| -2374| 2002| 7 
91 0. 5 | 0- 0.2019 | 0:2215 | 0:2399 | 0:2573 | 0:2738 
9-80 | 0:0574 | 0-0757 | 0-1069 | 0-1340 | 01555 pn "sraa | -2319| -2505| -2680| -2845 
‘85| .0646 | -0836| -1156| :1433 2 2042| 2255 | "2455 .2642 | -2818| -:2984 
‘90| -o745| -og42| -1273| +1556 “1810 2945 | -2462| 2664 .2852 | -3029 | +3195 
-95| .0908| 1114) -1457| -1780| 72010| "27-7 | 2874| -3081 .3270 | -3444 | :3008 
e| caer | Tazo] su n9 | 2426) SOP 
101 vega | 0:1847 | 0-2029 | 0:2202 | 0-2366 0-2522 
0-80 0-0518 | 0-0685 0-0969 | 0-1218 01444 0107 .1942| -2127 .2301 | -2466 | +2622 
'88| .o584| .o756| -1049| 1304) ‘1533| s ‘ogg | -2253| -2428 | -2595 +2752 
90] .0673| .0853| -1155| “1416 -1681 | pes ‘2958| 2447| "2626| 2792 -2951 
'98| 0820| -1009| +1325 .1594| +1835 en .9644| -2836| :3015 -3184 | -3342 
*99| -1140| -agas | -1678| -1961| 32M | 7 
121 1407 | 01577 | 01739 01892 | 0-2038 | 0-2178 
ro 0-0434 | 0:0575 | 0:0817 | 0-1030 | 0-12 0140 | 700 | -1824| 1078) AMD] aa 
'88| .0489| -0635| -0885| -1104| ' : 1768 | 1994) ` Site oR 
80! -ose5| .o717| -0976| 1200| “140 pe “1936 | -2105 | :2204| -2415 His 
'98! .06gg | -ogag | -1120| :1393] 1563 “Sigg | -2274| 2447 .2610 | 2763| - 
'99| .0960| -1133| -1423| -1670 "1891 596 | 0-1711 
161 o | | osel 01221 | 01951 | C197 ue aio 
"80 | o. a .0941 | 0-1059 S .419| -1545| -16 du 
gg | 00328 | 0-0435 | 0:0622 RE perm 1148 1 aon ‘1635 | -1788| 1876 
‘go | 0370| -0481 -0674 pes ‘yogi | “1281 Es 3o44| -1775| -1900| -2020 
195 | 0427| «0544 | -0744 00 | .1206| -2301 1506 | "l| -2056| -2185 | +2309 
ig *0521| -0645| -0856 | -1039 | 1464 | .1627 -1778 
:0730 | .ose4 | «1092| 1288] 7 | | X PR 
| I , = =4 M 
| | _ P, where p= 2 ) g= 2" 
Boon en 105p 9 E 


hi; 
8 table gives th 


e values of x for which Pr ( 


consider the problem of testing the null hy 
tion with distribution function 7 
come from F(a | 0) and a, 


r 
to record the cumulative sum S, — X (25; — 0), and, 


than that before the change, to use 
ie. max (S, — min Sj), 
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MISCELLANEA 


On problems in which a change in a parameter occurs at an unknown point 


Bv E. S. PAGE 
Department of Mathematics, University of Durham 


ginny Seg 

pothesis that all the observations come from the same popula- 
(x |0) against the alternatives that the first m (O<m<n), à el 
m1 +++ En Come from F(x | 0^) (0 3:0), where m is unknown. In an earlier pu 
1955) a one-sided test for a change in the mean of a distribution was proposed; the procedure was 


if the mean path after the change had a greater slope 
j-l 


M NES ; ito least value, 
as a test statistic the rise in the cumulative sum above its least val 


Sepe mi s » ; + the case 
So = 0, large values being significant. Some critical values were given for the 
O<r<n 0<i<r 


where the z; are 0 or 1 binomial variables. Here, 
general case and the procedure is shown to yielda 


Suppose 0, 0’ known, and let the sample be x = (a, 
problem in discrimination we h 


remainder from F(z | 0^), 
is specified b; 


R, (i = 0,1,..., n); the hypothesis His pre 
equally likely the minimum 
are defined by (Rao, 1948), 


where Li 


Suppose that we wish to 
deviation, c, known and c. 


+ Binnie : + " a re 
a discrimination approach is adopted for the mo 
modification of the previous test. 


2. ONE-SIDED CASE 


54). If we regard the detection of a change aS « 
ave n+ I hypotheses between which it is desired to discriminate; they mk 
hypothesis that the first i observations are drawn from F(x | 0) and the 
i.e. that the change occurs after the 
y the definition of a division of the whole 


-Hn where H; is the 


seni jon 
ith observation. A method of discr iminat im 
cA ive region: 
sample space $ into mut ually exclusive Te oni 
ferred if the sample point falls within R,. If the H; urea pr 


1. INTRODUCTION 

3 p r " n " we 

Given a sample of » independent observations in the order in which they were obtained, x, A 
probability of misclassification is obtained when the 


regions 2, (i = 0,1, sea tmu 


xeR, if L(x)»L(x) (jii) (21) 
i(X) is the likelihood of the hypothesis Hi. 


Example 1. Mean of a normal distribution 


H H . s; ^ 1 
discriminate for a change in the mean from It to y+ ò (8> 0) with the standart 
onstant. Then (2-1) gives R; defined by 


1 { d 
oD a, S eru- (k= 1,. 


-s ni), 


1 å 
SP-a, È (e-noe <1 (k-0,1...,2—i). 
Hence R; is defined by itk à - 
2 («)-n-$)>0 (k= 1... n— i), (2:2) 
j=i+1 2 
i n i 
E (s-a- iac (k= 0,1,...,í— 1). (2 3) i 
j-i-k 2 
The discrimination Prdtedure can be more clearly described in terms of the cumulative sum of the 
; T 
(c—4—49). Let S, = E -n- 39, S, — 0. Then the sample point X=(2,,...,2,) lies within Ri! j 
j= 
Y » j. 2-2) is y 
SOS NIS. dias Haie Sta-S20 (k= lnoi, | (24) 
and (2:3) is S.S .4x0 (k= 9,1, ...,4— 1).J 
Hence 5; is the least of the eumulative sums. Thus the procedure may be conveniently carried out M 
recording the cumulative sum on a chart and selecting the hypothesis corresponding to the minimu! 
of the sum. 


I, CO 


— 
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— ben 2. Distributions with a sufficient statistic 
ith frequency functions of K 
oopman's form 
f(x |0) = exp(4A(0) B(x) + C(0) + D(a)). 
FEE) = exp (B(x) A4(0) + AC(Ó)), 


Then 
where AG(0) = G(0' 
(0) = G(0’) — G(0). Accordingly the regions R; are given b 
i vi y 


ick 
AA()) X Bix) +kAC(O)S0 (bo L...,n—1i) 


j=i+1 
i 
AA x 5 
mn gu A Bae 1)AC()&0 (k= 0,..., i). (2:8) 
AO i 
(0) 0, the cumulative sum S= 5 [peo a 
jail ^ AACA) (2-6) 


y be recorded 3 used to select the hypothesis preterred. If AA(0) « 0, th: x 
O; and its minimum us 
pi (0) < 
na; hel f d. e maximum of 


the 
same c 
“ume ive i i 
T iis For ine d sum IS used in the diserimination procedure. 
orm. i n i s arian: mi 
a population whose mean remains constant at 4 but v hose i a 
h f variance y chan; 
= W y change 


fro 
m g? to g? 
Oo 07s * 1 
1 2, the cumulative sum to be recorded is 


i 

v " ij dq 

S= 2 [eme log 22 $ 

Inso Jes 3 Ui 2i 
me situati à ; 

throu uations it may be a priori 1 initi 

throughout the i J priori more probable that the initial value of th -— " 

initial value »" a i than that a change should occur. If the hypothesis H, s oa NEP 

s occurred, is a priori c times as likely as any of the other hypotheses gums. th ves 
: yt ‘he regions 


i» for the discri 
Š discriminati 
discrimination procedure are defined by 


xem, if LixaLi(x) (j+i ijn), 
and zcL,(X) (im) (27) 
xeR, if cLpl(x)>L(x) (m ^ 


For à; 

distributi . 

etween f. itions with a sufficient statistic the above conditions amend (2+ , r 
n H; and H,. H, is preferred to H,if subit ee 

n ~ 

X B(x; + — i) AC(0) > log c. (2:8) 

e minimum (for A4(0) 7 0) of the 


ds logc/AA(0); otherwise the 
] mean with unit variance 


AA(0) 
j=i+1 

referred is that i 

ise of S, above i 

ed is adopted. For the test 


ndicated by th 
ts least value excee! 
on a norma. 


Accor 


‘din 
eum gly the hypothesis to be p 


latiy, 

hypothesis sum, Sj, of (2-6) if the ri 
his riso poi no change has occurr 

n its minimum is ó-! log c. 

3. A ONE-SIDED TEST 
as a test of 
10). IE Fle] 0) 
rejected if the 


the hypothesis that the observations 
is of Koopman's form the appro- 
final point of the sample path is 
tin this test the criterion is the 


The 
Procedur 
ure of $2 is one that can reasonably be used 
n, F(x 


Tos 
tee, 
i initis) sclera à 
1l drawn from the same populatio 
null hypothesis 


Priate 
at lea? ee sum is plotted and the 
given distance above its minimum (AA> 0) (Fig- 1). Wenote tha 
he minimum value, whereas in the earlier test (Page, 1955) 
n. The properties of this test are likely 


nce of 
t f 
consider end-point of the path above t 
i the greatest distance of the path abov 


be d 
e diffie 
ult. Pea E 
t to calculate as it is equivalent to à trun 
large samp 


e Powe: 

"equis E function which is valid for a. 

i i. buio. er ee of H, may be obtained from t 
erms of hie Since the test is a rise in the cumulative ak at the sam Pint 

Titerion i he end-point as & fall below that value. Thusif we ook at he sample in the reverse order, 

is whether or not the cumulative sum Crosses a fixed horizontal boundary within » steps; 
in the presence of an absorbing barrier. 


18, the 
Problem is that of a truncated random walk orn 
the probability that the boundary 1s crossed within n steps when the 
d standard deviation g, has been 


ical distribution about zero mean an 
exceeds a constant 


he probability that the cumulat: 


e the minimum 
tial test. However, an approximation 


cated sequen 
Je and when a large rise in the cumulative sum is 
he normal diffusion process in the presence of 


ivesum at the end of the path, we may express 


ec 
that 
Very 5 
sim 5 i 
Ments, , ple approximation to 
Stra] have some symmetr ive sum Xy 
itage (1957). Let p: be t Ea 


250 Miscellanea 


k for the first time after the ith observation. Then the probability that the final sum exceeds k given tha 
the sum first did so after the ith observation is slightly greater than 1p,. Hence by summation, 


Ps lh a | 
Pr{S,2h>5 X pitp, 
Viet 


25 XX No. of observations 


Cumulative sum 


Fig. 1. One.sided test. Samplin, 


g experiment on random normal deviates; for n = 1,...,20,4 = % 
o = l; forn = 21,.. 


«50, y = 0-5, c = 1. Cumulative sum plotted is X(x — 0:25). 


where the symbol > means ‘slightly greater-than’, Hence 


Pr (boundary is crossed) + 2 [ = o n (31) 
on 


using the central limit theorem, where c(t) = (27)-} exp (— 4a?) dx, 
-—o 


A more general approximation for increments witl 


x ined 
à non-zero mean, which we need here, is obtaine 


à j des e 

3 A : roblem, by replacing the random walk in discrot® 
time by one in continuous time. For such a process in which the increment per unit time has mean ™ 
lative sum at time 1 is 


T 1 (y T 2bm 1 — 9b 2 lj 
Florine) e reti [eno st, (ior vt) oe RE Fr zn 4 ] : H 
where the boundary is at y = b (b>0) (e.g. Bartlett, 1955, p. 48). The probability that the boundary 
has not been crossed in the interval (O, T) is therefore 
b 
g(b, T) = f Jy, b, T) dy = à (m) —emig(. > m4T]. (3-3) 
sap OJT c JT o 


Hence given m and c? the mean and variance of the increment, y 
we can apply this result directly to our test to find the position, 
thatallthez'scome from the specified distribution will be rejecte 


»and T = n, the number of observations, 
b, of the boundary so that the hypothe? 
d when itis true with probability appro* 


a 
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mately any given value. A fi i 
. A few values are shown in Table 1 for a probability 0-05 
ER A fe probability 0-05 of Type I 
soe a to test whether in fifty observations a change from y to p + 0-20 ers in the som ient 
ion, the sum X(v, — 4; — 0-10) is plotted in the presence of a boundary at 10-00. 


Table 1. Approximate values of bjc 


| 
n 5 50 | — 100 
mjo | | | 
— 0-025 | 9-3 12-8 17-6 | 
— 0-05 $88 11-8 15 
— 0-10 TS 10-0 12-2 
—0-20 61 | 6-9 | T 


Nass diffusion approximation is of the same kind as the Wald approximation for the characteristics 
Mes quential tests and, as the latter are usually sufficient for practical purposes, it is reasonable to sup- 
ira that in this case the same holds. By an enumeration of paths to the boundary, it is possible to 

uate exactly the probabilities for the binomial ease and these agree fairly well with those.given by 


the ien 
approximate formulae (Armitage, 1957). 
4, TWO-SIDED CASE 


es us now consider the case where the alternatives to the hypothesis that é 
: 7n à distribution F(a |0) are that the first 7 observations come from F(x|@) and the remainder all 
ome from either F(x | 0") or F( | 0"), where 0'>0>0", i.e. the change m the parameter may be in either 
irection, We suppose 0, 0’, 0” known. Then in the discrimination problem we have 2n+1 completely 
Specified hypothenda: H : H+ H- (i = 0,129 1), where Hj (H;) are the hypotheses that the first 
intr vations amet Ble | fj) and the remain der from F(a| 6’), (F(x | 0"); "i superfix +, — thus 
um leates the direction of the change in 0. As before, we divide the whole sample space into Qn 1) 
utually exclusive regions Ry, R7, Rr ( — rasa fen )such that the region in which the sample point 
Sive regions ftn, Ati -i N iori equally likely the probability 


alls indi 7 ;potheses are a prio; 
indicates the hypothesis preferred. When the hypothesi Ari. ee ARRA: 8 
misclassification io ihe coded loss if the weighted losses are equal) is minimized if the regions are 


defined by: 
Heme Iit Cer0pls «odi d): 
XeR if Lj (x) > max (L7 (x), Lz) hal) (0<j; kan 1; jii n—1) } (4:1) 
XeR, if L,(x)>max (Lp (x) D; 09) 
wi x zit LM - (s), D,(X) are the li 
si a similar definition for Ry, where Lt (x), Li (x), jen pori 
um heses, A comparison of the H*'s (H~s) with Fi d procedures, 
;, mination. If M, is preferred under both rre hypothesis, Say H*, by the other, then H? is 
but ac hypothesis other than 


ed 
Prep,  9Ceduro; if H, is preferred by one: ái dure prefers some 
If each one in (3:1) to choose which of the two is preferred 


ferr 
» M by the two-sided procedure. CAR TU 
b: necessary to use the additional conditions imp! 


th 3 
© two-sided procedure. 


the observations all come 


(0<i<n—- 1), 
kelihoods of the corresponding 


to the one-sided criterion for 
it is preferred under the two- 


tribution 


ormal dis 
Example 3. Mean of a ; "M : . 
ex P 'IPpose that the th ible means are 0, +/+» and that the variance is unity. Then 77, i.e. that no 
an e three possible m , £ 
Be has occurred, is preferred if m eo 
max È (zi M - 
o<i<n-1 k=i+1 
A n 
nd a b (c, 13002 9- 
eir d that the maximum occurs for 


o<j<n-1 k 


insatisfied, suppos 
uni 


+ is the first an 3 x : 
o it is $ imum and minimum occurring 


v satisfied, the max? 


h inequalities are 
d to Hp if 

n 
X (ud) 


(zx — oz-— pamtl 


If; 

. Just 

iq ye of these inequalities is 
atg, n Hf is preferred. If bot 


~ aJ =m, then Hf is preferre (43) 


n 


TS 
Conversely, X 
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For a graphical representation it is convenient to consider the graph of the cumulative sum of the 
observations themselves, S; = X, against j. The conditions (4:2) and (4-3) are then expressible as 


whether or not the lines y — 5, = + 34(a—n) contain the path and, if the path intersects both lines, which 
of the intersections is the deeper. The lines concerned are those drawn through the end point of the path, 
with slopes +}. 


More generally, for distributions of Koopman's form with a sufficient statistic and for which A(0) 


1 
is monotonic, the cumulative sum X B(x;) can be plotted and lines with slopes 
j=1 


8° = (((0') - C(0)(A(9") — A(0), s” = (C(0") — C(0* IKAO”) — A (0^) 
drawn through the end point. It may be noted that when the procedure 
of 0, 1 values from a binomial population then H,, the hy; 
preferred; if the last observation is* 1',oneof the H+ 


This disadvantage does not necessarily persist if th 
If the hypothesis of no change, H 


is applied as it stands to a sequence 
pothesis that no change has occurred, is never 
hypotheses is preferred toH,,,and similarly if, = 0- 
e observations are grouped. 

ny is a priori c times as likely as any of the alternative hypotheses 
the procedure gives rise to two sets of inequalities like (2-11) for the comparisons between H, and the 
Hs or the H~’s; in the case of Koopman's distributions these yield boundaries w 
pass through the points (n, S,, 4- d(0, 0^)), (n, S, 4- d(0, 0" 
by the distribution of x. For comparisons of 
when c = l, i.e. it depends upon the deeper 


ith slopes s’, s”, which 
)), where d(0, 0*) is a function of 0, 0* determined 
one of the H*’s with an H-, the condition is the same as 
of the intersections with the above boundaries with dz 0 
(if (0, 0") = —d(0, 0^) this is of course equivalent to preferring the hypothesis indicated by the deeper 
intersection of the boundaries as drawn). An approximation to the positions of the boundaries if this 
procedure is to be used as a test can be obtained from the one-sided test and its diffusion process approxt- 
mation. Armitage (1957) has shown that the error in this approximation is small enough to be neglected 
in most practical cases, 


5. CONCLUSION AND SUMMARY 


A one-sided test for a change in a parameter within a set of observ. 
and an approximation to the boundary is derived. The test is sho 
cedure and the corresponding two-sided test is quoted. These tests are similar to the ‘restricted sequential 
procedures” proposed by Armitage in another context, where divergent boundaries and an upper limit 
to the sample size are imposed at the outset. Here our sample size is fixed and the boundaries are draw? 
at the end of the sampling. 

Tt is interesting also to compare the procedures given here with 
sequential tests of null hypotheses when nothing is specified about t] 
approach used the locally most powerful best test to find a criter 
fixed number of observations, would lead to the rejection of the n 
test of the mean, jt, of a normal population of known standard dev 


ations in the order drawn is given, 
wn to arise from a discrimination pro- 


those derived by Rao (1950) for 
1e alternative hypotheses. Rao $ 
ion which, if satisfied in less than & 
ull hypothesis. Thus for a one-sided 
^" iation, Rao's test is to reject the null 
hypothesis if X (5; — ui) z k for some i (0<i< N), which is similar to our 
i-i 

reverse direction, although the cumulative sum plotted is different, 

Rao's test for distributions of Koopman's form considers the c 
instead of using the differences as in (2-9), and will, of course, h 
general. For example, Rao's test for the standard deviation of 
plots a cumulative sum which has a mean path of slope zero on th 
sum of our procedure has mean path with slope different from z 


test when it is looked at in the 


umulative sums of B(x) A'(0) + C'(0» 
ave a criterion different from (2:9) in 
a normal population of known mear 
€ null hypothesis while the cumulative 
ero on all the hypotheses. 

I wish to thank Dr P. Armitage for reading this note in dr, 


aft and for allowin, > is paper 0n 
Restrieted Sequential Procedures before publication. & me to see his pay 
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Ws 
Testing for departure from the exponential distribution* 
By D. J. BARTHOLOMEW 
Scientific Department, National Coal Board, London 
1. INTRODUCTION 
: Ccasions arise in statistical practice when it is necessary to test whether a random variable, ¢, has the 
Xponential distribution 
: p(t) 2 Àe (t20) 
T 0 " i i A i 
Petrus basis of an observed sample containing 7 independent observations, say, 4, tas ++.» f. Examples 
is to be found in life testing and the study of the distribution of intervals between events occurring in 
a à or space (see, for example, Maguire, Pearson & Wynn, 1952). The object. of this paper is to compare 
power of three tests of whether a sample is exponentially distributed. The statistics considered are: 
n a 
M= -i * log, ty-nlogt), 
i-1 
` n n 2 
S= $a /(34) , 
f i=1 i=1 
m n 


o=% |t: — |/(2n8), 


| t=1 
| n 
Where i ul 5p 
fl 
mber of writers, in particular by Darling ( 1953), 


Th UEM 
"iwl criteria have been discussed in the work of a nu 

gave a unifi ^ listribution theory. ] 

i > net ns found by reference to some alternative to the null hypothesis. 

en circumstance depends entirely on the particular nature of 

ral one of trying to answer such a 


r aim will be the gene! 1 
r less) powerful than z?' Such information 


The 3 : 
Horn function of a test can only 
alternative is appropriate in a giv 


e 3 
Problem in hand. In this paper, therefore, ou 
tives is S more (o 


Questi 5 
Stion as, *Under what kind of alterna 
= he three tests determined with respect 


^ Shoy 
"d E of value in deciding which test to bosse 
alternatives wi idered and the relative po n l r ‘ vith rest 
9 each of them. It} E reise sible to obtain the exact power functions orev en good approximations 
E. Ne e se has been made of the well-known result concerning the 
pas f this method has been given by Cox 


“symptoti ; :netructive discussion o 
. D Ptotie relative efficiency of tests. An ipe aide approximation to the power of both M and S 
dem e satisfactory for sample sizes of the order of 100. 


iher s 
aem in the majority of cases. Instea 


tuart, (1955). In one case (see $4 belo 


id lisshown to b 


Di + * 
een obtained, the asymptotic method 
XPONENTIAL 


NATIVES TO THE E 
lected because they show thestrong 


ve been se 
s in which the 


2. ALTER 
concerned B» se and other alternatives can 


The fo all b 
ur alternatives wi ‘hich we shall be 
ernatives with which we sh: n of the way: 


d . s 
^ Apiao points of the tests. For a full discussio 
Al reference may be made to Bartholomew (1955). 
ternative ip P ara) Quy eM (t> 0, a>0)- 
pu = E totically most powerful 
te, Moran (1951) s d this alternative and showed that M ye en 2 a ws SES if 
0 St of the h; ETAT re has a zero ordinate at ! — 9 
Nac] pothesis a = 1. The curve 
Alternos, 
| €*n015. 0). 
| native TT pit) = aane- exp t- 00*/ (t>0, a> ) -nilar in shape to I above for 
This g; f Weibull. It is similar in s s 
distribution i i sith the name © nd a> 1 respectively. 
k a rib ; ated wi , <a<la 
4 Ps Ario Eae pecie equal to infinity and Zero di 
ternatiy, L E sata (t>0, a>0). A 
I ELLI plt) = AQ aM) I J-shaped with a finite ordinate ati=0. 
N co ; native is always- E oj lass considered by 
aging a rea: pou ornati s, this exponential It is a special case been " " " 
- gave it as an alternative s 5 coved for the Ph.D. degree in he 
* approved 
h " rt of a thesis 
i Unive © work contained in this paper 15 based on p& 


*sity of London. 
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" å sarna 
i i i i ity of variance. He arrived at the criterion 
n (1941) in connexion with testing for homogeneity o: ‘ive ; 1 Eu 
wr mma exc dun for testing the hypothesis a = 0. In the Pearson system this is a Type XI curve 
Alternative IV p(t) = Ae-(1--a(4A22 — QL 1) (20) 
" —' ined 
This curve is similar in shape to III. It is a special case of the Laguerre series which he 
from the Pearson Type III distribution in the same way as the Gram-Charlier curves are obtained fr 


the normal distribution. Cox & Smith (1952) gave it asa limiting form of the distribution of intervals 
between the events in a ‘pooled output’. 


3. POWER or THE TESTS 


The analysis of this section is based on the followin 


g result quoted from Cox & Stuart (1955). 
If there are two consistent test statistics $i 


and ds, ofa hypothesis Hy: 0 = Op, the asymptotic dos Aen 

efficiency (A.R.E.) is the reciprocal of the ratio of sample sizes required to attain the same power ^ 

the same alternative H,, taking the limit as the sample sizes tend to infinity and as H, tends to ; Pa 
Tf ¢, and ø, both have normal limiting distributions on H, and H,, the A.R.E. of Ø, compared t 


$a is given by p "T 
A.R.E. (Øi, 6) = lim (age) 5 


e 2 

where m 4) = o ZA | / var (d, 10 = 0), 
o0 10-6, 

provided that r satisfies the equations 


lim R(,)n-" = R; (i = 1,2). 

nn 
; i thi er 

The R; are constants independent of n which is the sample size. In all cases considered in this pap 
T = 1. For brevity we shall write 


ð 
f= dem 8s 
E a5 419) T V = var($|0 b) 


s late 
Tt should be noted that it is only necessary to know the limiting values of E and V in order to calcula 


: P > Nm s er- 
the A.R.E. This fact will enable us to find lim R? when the mean value of the criterion under the alt 
native cannot be found exactly. 


As to the condition of limitin 
condition on the null hypothesis. 


light of this remark. 
We now proceed to calculate E for M, 
natives I and II the null hypothesis corre: 


T. 


5 and under the four alternatives listed above. For alter- 
sponds to a = 1 and for III and IV to a = 0. 


a) When the observations are distributed in the type III form it can be readily shown that 
4(M) = —2(nlogn—ny(a) (na), 


d 
where y(x) = d; 28e T(x). 
Differentiating we find 
aM) o. UNYI) npn), 
ôa lant 


which, taking the limit as n tends to infinity, gives 
lim E = — 2n(y(1)—1), (1) 


We write ‘lim’ for lim throughout this section. 
n> 


(b) The expected value of S under this alternative was shown by Moran (1951) to be (a+ 1)/(an + 1) 
This leads immediately to the result d cond: (2 


+ 


ne 
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(c) The exact value of &(a | a) has not been obtained but we have 
lim &(w |a) = (Type III mean deviation) /(Type III mean). 
Straightforward calculation shows this to be 
I,(a)—I,(a+ 1), 


z 
Where I,{a) = Í 1271 ¢-tdt/T(a). 
0 
Differentiat ing and simplifying, we ultimately find 
lim E = y/e+ 1— 2/e — 0:6321, (3) 
where y = Euler's constant = 0:5772.... 
IL. (a) For the Weibull distribution we find 
lim &(M) = 2n(log, I'(1 + 1/a) — V(1)/a) 
ipe lim E = 2n(j(1) - V(3) = —2n. (4) 
"d lim &(S) = TU+ 2/a)/n(T(1 + 1/a)? 
"e lim B = 4(/(2) - V (3)/n = n. (5) 
(c) The limiting value of the mean of 7 takes the rather complicated form 
«© (—1) on) jus 
lim &(@) = ( ot Ba spiria, (ac) 
Where 25 (TQ +1/a))"; 
th 2 (— Jf 1 
P img = - [paco 2,5 ee 
8 i p 
"mming the series on the right numerically we reach the result " 
lim E = —(/(2) (1 — 2/6) 40-1645). 
I *. 
T (a) Under the third alternative ori even 
a lim £(M) = 2n(a(1- VO) A d T 
^ therefore lim E = 2n(1- (3) y(3) =" 
(1 — 2a). 
" lim &(S) = 2-9/0 a) d 
lim £ = 2. 
5h lim 6(@) = (1-0) " 
lim E = 3e 
I ; z 
i (a) Under the fourth alternative We find e 
lim4(M) = -¥0)-# ; (10) 
limZ - —* 
“r lim 6(S) = (1+ 200" an 
lim Z = 2. 
M lim &(@) = (0 3^ (12) 
lim E — — 1^ "native the criteria are compared 
h alternative r 
T i ble 1. Horea lumns refer to thenumbering 
With e foregoing results are brought together "i als at the head of the cot 
9f the 8 best test to give the A.R.E- The roman 
P i i Ee 
ano rrApIuaTY OF — r ion is to see how far these 
i pes ts. The object etm dene MA to the power of 
ee ; he tests. > Approxima 
limiting i so ] case. AP. : ants: 
» ting ts of § 3 are asymptotic popi les for one 8| Ss fitting curves having the same m. 
9th ag Properties hold good in finite S& PU alternative PY ‘sing an approximation o 
The 77 8nd S ean be obtained under the m alternative suggests " 
H 


Meulag;. under 
on c M 
of Bi and Ba for " 3, 
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where C and v are determined from the first two moments of M. Investigations into the form of the 
distribution of S, in the null case (Bartholomew, 1955), suggest that it might be fairly well approximat ed 
by a lognormal distribution. 

According to the figures in Table 1 we should expect the power of S using 100 observations to be almost 
the same as that of M on 39 observations. The power functions obtained in the way described above are 
compared in Table 2. In view of the approximate nature of the comparison the agreement is good; this 
suggests that the asymptotic properties may be used with some confidence for sample sizes of this order. 


Table 1. Values of the asymptotic relative efficiency of M, S and w under four alternatives 


So ais Alternative law 
Test Limiting 
AE | variance on 
As i . 
| null hypothesi | y 
| ARoS I | II | IH | IV 
| | | 
M (0 dny) 1) L00 | 100 | 0038 ; 
S |o 4n 0-39 004 | 100 1:00 
u 0-0591n-! 0-63 0-83 | 0-57 | 0-57 


| 
0-8 | 07 | 0-6 


a | 1-0 | 0-9 | | 05 | 04 
| | 
I 
- | - | u | | = NEM 
M, (39 obs.) | 0-050 | 0142 | 0-326 | 0-591 0-840 | 0-969 0:998 
S, (100 obs.) | 0-050 | 0-140 | 0:316 0-560 0-794 | 0-939 0-990 


5. CONCLUDING REMARKS 


Table 1 shows that the form of the alternative has a pronounced effect on the relative power of the three 


tests considered. On alternatives I and II, M is clearly the best test to use while the very reverse is true 
for III and IV. The reason for this can be appreciated by noting that it is the relatively small intervals 
that are most important in determining whether M is significant. We should therefore expect M to be 
the most powerful under alternatives showing a marked departure from the exponential near ¢ = 

This is in fact the case for I and II which take either zero or infinite values at this point. On the other 
hand, the value of S, which is a sum of squares, is influenced most by the largest values and so would be 
sensitive to departure from the exponential at the upper tail. The w test takes up an intermediate posi- 


tion in both cases, giving less weight to the small values than does M and more than S, and vice vers? 
for the large values. , 


Insome instances it may be possible to decide, a priori, on an altern 
whether this will be so in general. Under the latter circumstance w 
will always be second best, whereas either of M or S could be the worst. In this way we minimize the 
maximum loss in power which might be ineurred by a wrong choice of test. 

Even if it is known that M provides the best test, it can only be used in practice if the small values of 
t are recorded to high accuracy. This is not often the case, in fact ver 
corded as zero, thus making M infinite. Bearing in mind that the signi 
extent on inaccuracies in recording, it would be safer to use w when 


ative distribution, but it is doubtful 
could be used on the grounds that it 


y small values are frequently oe 
ficance of M can depend to a large 
there is any doubt. 


My thanks are due to Dr N. L. Johnson, for his advice throughout this work, and to the Department 
of Scientific and Industrial Research for the award of a maintenance grant during the period when the 
research was carried out. 
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The distribution of range in normal samples with n = 200 


By B. I. HARLEY axp E. 8. PEARSON 
University College, London 


l. INTRODUCTORY 


ion of the range, w, in samples from & normal 


ts, over the whole stretch from samples of n = 2 


Tn hi 
S o m ? 
paper of 1925, Tippett considered the distributi 
the population standard deviation, 


Populati 
jion, chiefly from the point of view of the momen 


ton 
= 10 
C, as a to i tabled the expected value of the range (in terms of 
ive decimal places for n = 2( 1) 1000, but only gave values of the standard deviation and 


eta co, x 

seit ween n = 2,10,20, 60, 100, 200, 500, 1000. His standard deviations were given to three 

and f, E ut he considered that little reliance could be placed on the third decimal place in the 

ecause of thi es. After 1925, attention was focused on the distribution of range in small samples, largely 

Carson & H e usefulness of this statistic as à measure of variability in industrial quality control. Thus 
i artley's (1942) tables gave the probability integral of w to four decimal places for = 2(1) 20, 


Y intervals of 0-05 in w. 
ave been made* that a comparison of the ran; 
e a useful purpose in quite large samples, either as & test of homo- 
tation. Tukey (1955) discussed certain methods of 
Teenta, " ecess to calculate such quantities as the 
ge points of w or of w/st within the tion which has been 
caleulations.] He pointed out, however, that satisfactory 
e highly accurate computations had been 
d derived values given below for the single 


ge and root mean-square 


ator: 
mators i 
5 ofthe population 7 may serv! 


Tesultg 
X hardly be established finally u 
en. = between, say, n = 100 and 1000. The tables and deri 
= 200 are presented as & contribution towards this objective. 
F RESULTS 


seven decimal 
e as descri! 
put it was decide 
B Table 2 giving the prob: 
down to 8% ii 0-05, was derived by in: 
alculate the mom 


2, DESCRIPTION o 

places at argument intervals of 
bed in $3 below. The last figure 
d to put the seven-figure 
ability integral to 
terpolation from 
ents of the dis- 


1 of the range, w» to 


pendently by 


"abi 
elo 

Bives the probability integra! 
the middle of the table, 


0-25 
. Ea 
ay 6 "s value was computed inde 
“Sults on error by one or two units in 
record, rather than to cut 


fo 
t ecim; 
d "mal places at the more convenient argumen 
ihe latter it was a 


trip we fi 
tibuti © figures of Table 1. From t 
e formula 
i © ww Oa 2 
0 


10n 
of w by quadrature, using th 
pw) = (— Wo) +” 


ded example, David, Hartley & Pearson (1954). 
(b. dditi is the usual standard deviation estimate ofc 
Sar, ional values of the second moment of w atn E te 
A a 
224) highly acc in 1932, 25 follows: 


* 
caleulated fr 


30,45 and 


for Son, 
nag a. With Ruben’s (1954, P- » 
» 45, it is possible to improve o the figures given i 6 
Sample size 80 
; „435 684 0-389 01 
Variance of w 0-479 783 ees 06 0-623 7 
0-692 06 " 
Biom. 44 


St; 
5 andard deviation 
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where wy, is a convenient working origin near the mean, taken in this case as 5:5, and P( i Pap wed 
is the probability integral. As usual in carrying out a quadrature of this kind, when we SN e one 
available information to the limit, it is a little difficult to be sure of the exact degree of acc v P ine 
results obtained. As a check, we computed moments by applying quadrature to T M n 5f 
ways: using (a) the seven decimal and (b) rounded off six decimal place values of Q(H ) a : SE SÓ. 
0-25; (c) using only the alternate seven decimal place values, i.e. values at an argument inter 


Table 1. Basic values of the probability integral 


Ww | P(W) Ww P(W) Ww P(W) 
6-00 0-822 4992 9-00 0-999 9962 
3:25 0-000 0000 6-25 -902 9409 pecu 
3-50 -000 0003 6-50 -950 5744 E 00i 
3-75 -0000181 675 -976 3683 1-000 
4-00 0-000 3917 T 0-989 3214 
4-95 -0040133 T -9954158 
4-50 -022 5924 T -998 1226 
4-15 0793777 d -999 2642 
5-00 0:193 6026 8-00 09997233 
5-25 -357 9398 s25 | -999 8999 
5-50 -5390411 8-50 -999 9652 
| 575 -700 6038 875 * | -999 9883 
| 


ument interval 


Table 2. Derived table with smaller arg 


w P(W) wW 


| PW) w PU) w Pav) w pon 
| i} "T = 
2s | | r | 9993 
3-75 | 0-0000 4-75 0-0794 5715 0-7006 675 | 09764 7:715 ON 
3-80 0000 | 4-80 — -0974 580 | 7284 6-80 -9798 7-80 gaos. | 
985 | 0001 | 485 | -1178 | 585 | -7546 | ess 9827 | 785 | je 
3-90 -0001 4-90 -1407 5-90 ‘7790 | 6-90 -0852 7-90 "9997 
3-95 | 0002 495 | -1660 595 | -8016 6-95 9874 qos | 799 
| | | 
| | | 
4-00 00004 | 500 | 0-1936 6-00 | 98225 | 7-00 | 0-9893 $00 | | 
4-05 -0007 505 | -2233 6-05 -8417 7-05 -9909 805 | 
410 | 0011 510 | -2548 610 | -8593 7-10 -9923 8-10 Ae 
4-15 -0017 5-15 +2880 6-15 | “8753 T5 -9935 8:15 9998 
4-20 -0027 | 520 :3325 | 6-20 | -8898 | 7-20 «9846 | 9-90 :9999 
I "m g | 
4:25 | 0-0040 595 | 0-3579 6-25 | 0-9029 | 7.95 0-9954 8.25 0:9999 
4-30 -0059 | : 3941 | 630 | -9147 | 7.39 -9962 | 8-30 -9999 
4-35 -0085 | 2908 | €35 | zs | mas | aoga | sag -9999 
4-40 -0121 | 4071 | 640 | -9347 7-40 9973 | 8-40 -9999 
ves -0167 | 050834 | 645 | -9431 | 7.45 9977 | $45 | 1:0000 
| ‘ 
4-50 0-0226 0-5390 6-50 0-9506 7-50 0-9981 
4-55 -0301 -5739 6:55 :9572 7:55 -9984 
4-60 | -0393 -6076 6-60 | -9629 7-60 -9987 
4-65 -0505 -6401 6-65 | -9680 7-65 -9989 
4-70 -0638 *6711 6-70 :9725 7-70 -9991 


t 
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On the basi 
> basis of these evaluations i 
= f 1ations we believe t i 
ne or two units in the last an glivyethab the following figu IM Soa by more than 
Mean w = 5:492 0853, Cu = 0:565 992, fly = 0-320347 
ja = 0-090 935, p, = 035316, J, = 025153, po= 3-4414 
ard interpolation the percentage points for w shown in Table 3. 


Tt was lene 

mine hy i the tables of standardized 0-5, 1-0, 2-5 and 5% points of Pearson curves (Pear: & 

to two Dm 3 S 42) are used with the moments given above, we obtain almost complete "iaa 

dlettiutton af B aces with the figures given in Table 3. This means that to the accuracy considered, thi 
of w must be very closely representable by a Type VI or inverted beta curve aia 


Fr 
rom Table 1 we have obtained by backw 


Table 3. Percentage points of range in samples of n = 200 


| o —' 
Perce | | 
Paama | oa Df | e e | ue 
‘Dover points — £094 © £281 | 4372 4-518 4-648 4-806 
r points 7-670 | 7225 7-020 | 6-731 6496 | 6:238 
| | 
- i 


rcentage points for w/s given by 


curate values of the 


the values of the pe 
lly now that more a 


Fin . 
all iu 
y it seemed worth examining whether 


David 

et 5, 3 
igi. c (1954, p. 491) would be altered substantia 

is of w were available. Using the same method as these authors (p. 483) we derived the values 


Shown f, 

or Ci 1 . ] H 

attempte, LORS rison in Table 4. For practical purposes, the discrepancies are not serious. We have not 

est Gub E pe use our results to improve the empirical formulae suggested by Tukey (1955). To get the 
of his method of attack further accurate computat: d seem to be needed, perhaps as 


he sugge ions woul 
in the n sts for n = 1000. To carry these out, the desk computing methods we have used, as described 
ext section, would seem to involve & prohibitive amount of labour. 


Percentage points of w/s 


Table 4. 


n | Lower percentage points | Upper percentage points 
LM — — 


| | 
| | | - | 
5 | 05 10 25 | 59 50 25 18 os | 
— | | 
Day; aA | | | | | 
avid et al, | 4-50 | aga | ear || 228 osa | 659 | 685 703 | 
" ; | n D. s " 5. | 
L m improved moments | 453 | 4:59 | 4-68 4-18 | 6-39 | 6-60 | Hn ipm 
= I ! 
The 3. METHOD OF COMPUTATION 
: the Quobility integral of range in a normal sample of size n» p(W) = Pr {w< W}, may be expressed 
T à u n-l 
Ww n eo si) de bu. " 
AW) = 2(a) da 2n z(u) (f T 
2 P(W)= ME + Lal Pe” l | | | 
Rector. (x) = (2g)-te-1?. WI k 200 the main work of computation consists in evaluating the 
i T. hen n = 
‘integral in (2) or ü 199 3 
400 jm z(u) [f a(x) aj du (3) 
2 -w 
at i pan) was aimed at, it was necessary to caleulate 
in 
nt where the expression became 


Y q 
Quad. 
z(u) y iing, As seven decimal accuracy 
199 
less (jr (x) de) 
n Ms 
l unit in the 10th decimal p 


iW up to the poi 


sofu from 2 
ature had some- 


ed in the quadr: 


for a series of value: 
aces were taken from the 


ime, lace. Th 
S to be 

be as small as 0:025. Values of z(u) and on, the integral being 

rmal P j 


53) tables of i 
^ y as much as 0-005. 
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raised to the 199th power using tables of 15 figure logarithms. The first term in equation (2) was 
obtainable without quadrature, using the same tables. 
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Studies in the history of probability and statistics. V. A note on playing cards 


By M. G. KENDALL 
Research Techniques Unit, London School of Economics 


1. In an earlier article in this series (Kendall, 1956) I referred briefly to the introduction of paye 
cards into Europe. Subsequent correspondence arising out of that reference suggests that it may be 
useful to expand a little what was there said about the impact of cards on gambling. 

2. Playing cards as we know them to-da 
to the beginning of the fifteenth century 
and more vague and their 


y in western Europe can be traced back in a clear line of Lise 
; but from that point backwards their history becomes ges 
genealogy more and more fabulous. Where they originated is uon : 
Claims have been put forward on behalf of origins in China, India, Arabia and Egypt. Itis atleast equa x 
possible that they were independently invented in Europe. From the first the pictorial Pepresentebor. 
on the cards were thoroughly Western and do not Suggest, to my eye at least, any trace of Eastern pease 
such, for example, as does the rook in chess.* Gambling with paper tickets is said to have been known vs 
China in the twelfth century, and it is possible that the idea of playing cards drifted across to ok 
along one of the early trade routes; but for the translation of the idea into practice one does not need 
look further afield than fourteenth-century Italy. 

3. No mention of cards has be 


authors like Chaucer and Dante, who mentioned everything, shows that they cannot have been known 


i 3 irl 
diterranean countries and Germany fairly 


refers to charticellae in quibus variae figurae pingantur, p: a 
» Queens, Valets and Chevaliers and, as I interpret him, the 


* Modern tarot packs are worthless evidence in th 
notably Court de Gebelin, who suggested in the eighteenth century that the trump cards incorporated 
the lost book of Thoth, and Eliphas Lévi, whose work on magic popularized the idea in the nineteenth 
century, tarot cards have acquired symbols such as sphinxes which are absent from earlier western cards. 

f The early suits were clubs, coins (diamonds), cups (hearts) and swords (spades), which San Ber- 
nardino identifies with brutality, avarice, drunkenness and hatred. The court cards denote those who 
are outstanding in these vices. ‘Presbyterorum et presbyterarum in numerabilem multitudinem esse 
volo: unde admittantur pusilli et magni, et feminae et maseuli, periti et ignari, sapientes et stulti. 
A. hundred and fifty years later one John Northbrooke was inspired by this passage to coin the famous 
description of cards as ‘the devil's picture books’. 

Miss Gertrude Moakley (1956) has recently suggested that the trump 
the Trionfi described by Petrarch in one of his most popular poems. 


is connexion. Under the influence of occultists: 


cards were representations of 
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4. The 
game of tarock (French i 
versions and is ench tarot, Italian tarocchi) is still played in P 
cards divided Ta the oldest card game. In its modern eo v» de eor a em "Mia 
Miss wi dics tee our es 21 trump cards and a wild card or joker, 78 pee a, at ped 
97 cards. 5 ype exist. There are some early packs of i ecl teen eny 
k. BO y even larger size, notab! inchi: 
ordinary cards, 35 trump cards and 6 wild cards. Historians of rsen ps 
3 also mention 


a very rare 
^ t ri a la 
set of engravings known as the Mantegna tarot; but they are probably not by Mante; 
à bably Mantegna, 
They are engraved on sizeable but thin sheets of 


a 
ie op much whether they are tarot cards. 
Biting: a iardly have been used in any sort of game involving shuffling and dealing; the 
sets of ten, the first, for example, enumerating various social grades from erac ir 
ird giving ten branches of learning, and so 


the Po 
on. adinim ao giving the nine Muses and Apollo, the th 
3 pinion is that these cards were a teaching device and that the unknown inventor of the 


Tot pack pied BER 
copied some of the W. M N 
T m. Where he got the others is a mystery unless Miss Moakley is right in 


identify; 
: ing them with Petrarch's Trionfi. 
9. At som x a ; : 
were dapra; —€— point of time the tarot pack was simplified, or so I believe. The trump cards 
, and of the 56 ordinary cards one of the court cards was also dropped. (In most countries 
for reasons which it is interesting 


it was th > 
ub Cpe who was dropped, but in Spain it was the Queen, 
able to speculate upon.) Thus there evolved the basic pack of 52 cards which is in general 
but in northern Europe is now mainly used for 


use to-day. T 
ay. The tarot pack survived independently, 


fortune-telling, 
6. Th Hs t 

egree na of the history of probability is interested in these matters only in two respects: the 
Of cards in th ch cards extended and encouraged gambling, and the reasons for the choice of the number 
early s han: e early tarot packs. From what San Bernardino says it seems that gambling began at a very 
outcome iae. i as soon as a game came into existence, the adversaries began to wager on the 
the gamo f owever, extensive gambling with cards was of very slow development. Cardano mentions 
of primero, but early writers on chance confine themselves mainly to dice. The reasons, I think, 
es at cards was t00 


Wero t; 
we b : x ^ : E 
ofold: first, the permutational arithmetic required to deal with probabiliti 
ensive and dice much more common. Cards did not oust 
mon involved more 


compl; 
[S e secondly, cards were very exp 
Skill an, i a eighteenth century. A third reason, possibly, is that cards and backgam 
: ad a higher social status. James I (1603) puts it rather well: 
oom which, being empty, 


s—since they may at times supply the rı 

—Iwill not therefore agree wi 

Say d such like games 

Soldiers ve may ye lawfully play at the cards or tables; for, 

and as FE play at on the heads of their drums, beingonly ruled by hazard, 

co, Ses, he chess, I think it over-fond because it is overwise and philo 
“mpels mae was not very good at cl 


7 

: Tiei 

bu ^ * corse, to inquire why th 
Boo abbling in numerology let u$ note ho 


th the curiosity of some learned men 
d stormy weather, 


E ates 
or sitting, or human pastime 
h best deboshed 


Would yy 
? patent to pernicious idleness 


Our * 
thon p 22° in forbidding cards, dice an 


sophie & folly.’ 

hess, dmindedness in a Puritan age 
H214-10r 56-+35+6 cards, 
One can make out à Very 
lendar. The four suits 
months, the 52 cards 


nsisted of 56 
s a subject it is. 
rds and the ca 


e early tarot packs co 
w treacherou 
ack of playing C9 


Cas 
Correspon ix a connexion between the modern P 
0 t] d to the four seasons, the thirteen cards in a suit correspond to the lunar m 
; Knave, 12 for à Queen, and 13 for à King, and add all the 
7 for the joker, gives us the number of days in the 
i an this, but there is, in 


Wei 
eks of the year. If we score 11 for a 
adding one 


gued on less coine 


d the calendar. Th ds were 56 in number. 


e cards in the early 
ber of ways in 


identa 2 
e early suit car 


the number of th 


e 
ointa 
fo 
Year,» M; the 52 cards we got 364, which, 
act „ Many a histori i : 

> storical point has been ar; 


no oo; ri 
nnexion between the playing pack an! 


Pack, Neverthe} b n 
“ES a; ess, there are some striking resemblances etwee 4 
giin nd the number of ways in which dice can fall. The main number, Docs the article (1956) 
ae s Tee dice can be thrown permutations excluded, and I pointed out 10 p pre E aero! 
ta Sin S9 Ways were well known by the fourteenth century, The nume mr of the larger Minchiate 
rot e Which two dice can be thrown permutations excluded. er rou: e ; xu ho egored, or the 
i c ing three dice W: es 
arise either as the pumber of ways of pee Hobe ropared to lean very heavily on these 
having to choose somehow, 


Cop r o 
vei en Ways of throwing four (four-sided) astragali. I first packs. 
Te in: ces; but they suggest that per sien MR 


ue; 
K ns need by their knowledge of dice-throwing- m 
lis « T is thi n Ace, Two, ree, ++ 
Zone? is 59, ue the worst. The number of letters in the px Heine, Roi an din 
de jy Bube end in the sequence As, Deux, Tros. WO Iowe this information to 
us ^ Dame, Konig, the ch being taken as one 6477 


* and Co. Ltd. 


., Ten, Ji ‘ack, Queen: 
As, Zwei, Drei, ..- 
the firm of Thos. 
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9. In conclusion, I should like to correct one guess made in my earlier article. I suggested that the 
game of hazard was brought back to Europe by the third crusaders. It may, indeed, have been brought 
back in some such way, but if so, must have been imported by earlier crusaders. The word hasart occurs 
in line 10557 of Wace's Le Roman De Brut, dated a.D. 1155, and also in Chrétien de ‘Troyes’ Erec et 
Enide, line 356, dated a.D. 1160-70. For these references I am indebted to Prof. Brian Woledge, 
who remarks, incidentally, that the appearance of the initial ‘h’ in ‘hazard’ is an etymological 
mystery which has never been solved. 
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A singularity in the estimation of binomial variance 


By ALAN STUART 
Research Techniques Unit, London School of Economics 


SUMMARY. For the symmetrical binomial distribution, the limit distribution of tho sample 
variance is non-normal and has variance of order l/n?. 


l. For the binomial distribution, the sample proportion of ‘successes’, p, is a sufficient estimator of 
the probability parameter 7, and y = p(1—p)nj(n—1) (1) 


is an unbiased estimator of z(1— 7), which is n times the sampling variance of p, where » is sample size. 
p(1—>p) is the sample variance. Since y isa function of the sufficient statistic, it is the minimum variance 
unbiased estimator of its expectation. Its sampling variance is 


LA 
Vy) = (5 AV (p) + V(p?) — 2€(p, p*). 
which is expressible in terms of the moments about the origin of p as 
7, n M. ^ z , ^ L3 
V(y) = =m | {Ha — 244 + u3 — (u4 — n). 


On substitution for these moments we find 


Vy) = n asm E, (2) 
If m+}, (2) gives Viy | 783) »7(1—7) (1—27)2/n, (3) 
while if 7 = 1 (2) becomes V(y|7 = 4) = 1/(8n(n— 1). (4) 


It is easily confirmed that the right-hand sides of (3) and (4) are the information bounds to the sampling 

variance of an estimator of z(1— 7), to order n-1, n-? respectively in the two cases. " 
The fact that at 7 = } the variance of y is of lower order than n-1 raises the question whether it$ 

limiting distribution is normal in that case. In the following sections it is shown that this is not s0- 


2. The characteristic function of p(1— p) may be written 
Żra-n(t) = Etexp[0(p — »?)]! 
= exp (01?) E(exp [Üp(1— 27)] exp[ — Op—mn)2}, (5) 
where @ = it. Since, for random variables u, v, 
E(uv) = E(u) E(v) + C(u, v), 
we may replace the expectation of the two exponential terms in (5) by the product of their expectation? 
plus their covariance € = Ctexp[Up(1—22)). exp[—O(p m]. (6) 


Miscellanea 263 


Ifz =: R s 
+, the first variate on the right of (6) is a constant, so 
C=0 (mz-1i) (7) 


TE : 
7 3: 3, we may use differentials in (6) to obtain 
€ = {0(1—27) exp [Op(1 — 22)] (— 0 exp[— 6(p —7)]) Cip. C» —7)5 + Es, (8) 


ing term given on theright of (8). The first two factors 


ther i 
en siBel p ; 
nainder being of lower order in n than the lead 
and are there of order zero in n. 


on the rj ] 
right of (8) are to be evaluated at the true parameter point p = 7, 


The third factor is 
exa, Cp. (p—7)) = E(p-7» = 20 —7) 027) 
Xaetly, so we may write (8) as 
Dess C = kn*+0(n-*). 9 
Ing (7) and (9), (5) becomes, for all 7, ^ 
(10) 


dua D = exp (07°) sas!) Paya (0) +0(n-4). 
*. By the classical central limit theorem, 


(p-m) 7) 
haracteristie function exp (302). It follows that its square has 
20)-}. From these 


has i 
Sa limiti CEN " " 
miting normal distribution with c 
d characteristic function (1— 


a limiti X eur i 
estilte © X? distribution with 1 degree of freedom an 
8, it follows that 
Qoa s; 0) = expt 
dua) = (1+ 2000 — 


On(1 — 27) + 4021 — 7) (1— 27)*/n} (1 + 00) i 
m)/nyt (1 +0(1))- an 


Subset 
Wbstitution of (11) into (10) gives 
Qa») = expiUn(1—7)- Men(1—7)0— 
correcting for the mean 
s 7) (1— 277)? n(n- 09 
x {1+ 20n(1—m)/(n— 1)73 (1+0): (13) 


22)2/n) {1 + 202(1 —m)/nj-4(1+0(1))- 


Fron 
à the definition of y at (1), we have, 
On(1—m)/(n— 1) +40" 


n 
9y-sa-g(t) = expt 


Ir 
+4, we standardize jj. using (3), to obtain for 

Uer z= (y-al mv 

e Characteristic function ] 

d.t) = exp tinni -aiiin D 0 20] niin- 19 
inia P neget aint a FO (14) 

80 5 

that lim d.t) = exP Q0) (m+?) (15) 

n 


standardized, is normal in the limit. 
d (13) becomes 
(16) 


and the di 
he distribution of y, properly 
14-0/[2( — p-d 4-00) 


>l © 
a (13), m = 4, the term in 0? disappears an! 
du-a(t) = exp OAN- pt 


en 
°W standardize by (4) and obtain from (16) i : a7) 
Th MN gnégeem as) 
us E lim $4 = exp (0/24) 0 40.2) (7 =) 
an, no j 


d th 
caini 
F distribution is non-normal. 


Yor 
^ the form of (18), in fact, it is clear that 


rge samples. 


Sa y2 g: l 
X* distribution with 1 degree of freedom m la i 
he distribution of y iS clea 


ha, 

r. When 7 — 4, m(1—7) attains its 
‘has à maximum value 4n/(n — 1), 
lue which tends to one extreme 


-ewed, and the effect becomes 


- Th 
. e . : 
a bur, ne for this singularity ™ tł 
one ap value }. y is an unbiased est imator of t 
m us sige nes 4 when n becomes large. Since Yy is 
D hend it is to be expected that the distribution o 
nounced as sample size increases. 
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The position is comparable with that of the distribution of the squared sample multiple scam 
coefficient, R*, in samples from a multinormal population. If the population multiple correlation pur 
meter is non-zero, E? is asymptotically normal with variance of order n-!. But if the parameter is Yo 
R?, like y in our discussion above, is essentially estimating a parameter at one extreme of its range, has 
variance of order 5-?, and is non-normally distributed. 


5. Finally, an implication of the result should be briefly mentioned. In testing the equality of the 
values of 7 in two populations, we use a large sample standard error test based on the fact that 


1 1 1\)3 
(.—pa (m —7T) (+5) 


is asymptotically a standardized normal variate. If we are testing the composite hypothesis, 7 being 
unspecified, we estimate 7(1—7) by y as above, calculated from the pooled samples, with n = n, 4-?s- 
Our result implies that, if 7 = 1, y estimates its expectation with greater precision than if 7 + $, and this 
presumably improves the accuracy of the normal approximation using the estimated standard error. 


Student's distribution and Riemann's elliptic $eometry 


By AUREL WINTNER 
Johns Hopkins University 


The density of probability for Student’s ratio is* 
cil +22), (1) 

where k (> 1) is the number of the variables and the value of c = c, is determined by the condition that 
the integral of the function (1) over the x-line is 1. The algebraic structure of the function (1) suggests 
a simple geometrical approach} to Student’s distribution. In fact, such an approach leads to the 
following interpretation: ‘onal 

If k is denoted by n+ 2, and if R, (where n> 0, hence k> 2) is the space of Riemann's n-dimension@ 
elliptic geometry, then refer R, to his normal co-ordinates 2,,...,2,, where — co <#;,< oo; consider E 
R, that distribution which represents equidistribution (in terms of the co-ordinates 2;); denote by L 
a line through the origin of R,,; finally, denote by x, where — oo <s « co, the abscissa of that point of E 
which is the orthogonal projection of an arbitrary point (25, ...,2,) of Ra. Then the function (1) on L s 
the density of probability of the orthogonal projection of the equidistribution on R,,. This can be seen as follows: 

Stereographic projection of the Euclidean n-space shows that the squared line-element on R, is 


224 2 2 (2) 

ds? = g M dai, where gi =1+ aj 
i-1 i-1 

if the ‘diameter’ of R,, is chosen to be the unit of length. But if 


n n 
de => > ijs, «+4, En) dz; dar; 
i-14-1 


is any (positive-definite) Riemannian metric, then the Riemannian volume element is (det g;ņ;)t times 
the Euclidean volume element. Hence it is clear from (2), where Iis = 98,4, that the Riemannian volume 
of that portion (infinite slab of thickness d) of the (a1, ...,$,)-space which is contained between the two 
hyperplanes £, = v, x, = z-- dv is dx times the value of the (n—1)-fold integral 


E o 
i “f (+a?+7?)-" day... daxq_1, (3) 
—o de. 


where r? = a3 +... 25.4. Accordingly, the function (3) of a, where — coa — co, is the density of the 
equidistribution projected on L, if L is chosen to be the z,-axis. 
In order to evaluate the integral, use (ordinary) polar co-ordinates,” and » — 2 angles, in the Euclidean 


(n — 1)-space (£1, ..., nx). These reduce the (n— 1)-fold integral (3) to a constant multiple of an integr? 
over r alone. In fact, the result is 


co 
ef (1 +a? +r?) rrn- dp, 
0 


* See, for example, Kendall (1952, chapter 10) where further references will be found. 
+ In this connexion see Wintner (1940, pp. 287-97 and 1947, pp. 168-73). 
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here C = C ». being the contribution of the n — 2 angles, is the Euclidean measure of the unit sph 
n E cont 1 1 iphere 
gles, 1: lid 
= 1). But if x, hence X = (l Ta), is fixed and if the integration variable r is replaced by ¢, where 
, 


r= Xii : 
; then the last integral appears in the form 


co 
f XU 4 077 (X72) e- dt, 
which, since X is i 
, X is independent of t, i i rS 
o; : p ent of t, is a constant ] -1 = (n H 
f Student's density (1), where k = n+2. nt multiple of X-32X7-1 = 1/X*! or, since X = (1+2*)!, 
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Some interrelations among compound and generalized distributions 


By JOHN GURLAND 
Iowa State College 


]. INTRODUCTION 
le, 1920; Feller, 1943) that the Pascal (negative binomial distribu- 
d distribution if the compounding is effected through a gamma 

& Mollison, 1948; Quenouille, 1949) that the 


ted (cf. Jones 
lized Poisson distribution if the generalizing is 


Tt ig 
tion) rea known (cf. Greenwood & Yu 
distribute be regarded as a compoun 
ascal em y More recently it has been no 
effected th tribution may also be regarded as & genera 
The do rough a logarithmic distribution. f 
Alizin velopment in the present paper consists of the following: first, 
that, exist; ized distributions is intro! 
1 Ome ids a certain class of compound an 
Ogarithmi ples relating to the theorem are noted. Next, th 3 
the Een. mic distributions indicated in the papers referred to above, is ex: 
eral extension for a finite number of stages is indicated. 


a convenient symbolism form- 
duced; then a simple relation 
ulated in a theorem. 


DISTRIBUTIONS 
Feller (1943) and Satterthwaite 


Poisson have also been con- 
s and notation are given to 


Corn UND AND GENERALIZED 
(1943 ound and generalized Poisson variables have been studied hra 

ore recently, compound and generalized variables € finition 
kellam, 1952; Feller, 1950; Gurland, 1957). The present dein 


manipulations with such distributions. 


Sider : 

"red (S 
facilitat, 
value of the para- 


6) for a given 
n function P's(25). 


finiti 
on 
1. Compound distribution Fil 
tion Fala s: 
n function. ‘say, with distributio 


ot th, f 
Meter g, gaaadom variable X, have the distribution functione say 
enote y "ippose now that 0 is regarded as & random variable x ] 
Y X, 4 X, the random variable with distribution functi 
1 

É Fri | en? ars) m 

Wh a 
9roc; Pe ibed e. Then the random variable 

1s . š rescribed sense. 3 ` 
Xa a constant which is arbitrary Or restricted zn 7) is calle da compound X, variable with respect 
onsta: 

mulation of the theorem. 


in the fo 


+ could be incorporated in the P, 


t à 
© the * (uniquely defined here except for the c 


Tt The dde: Pounder’ Xp. "T 
hono d xo Oe of introducing the constant ¢ d pnm the constan. 
o be z i þove de s 
remarked that in es s bobo written equivalently Ed 


te. s 
ad of in the F, term, since (1 
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Ifseveral parameters 6,,.,...,9,,say, appear in the distribution function F^, then the notation X, aX, 
is inadequate to specify which of the parameters is involved in the com 
a case, to avoid ambiguity, we would write 


X(O;, 05, ...,0,) A Xs, 
a 
where 0), say, is the parameter specifically involved in the compounding operation. 


As an example of definition 1, let X, be a Poisson variable with p-g.f. (probability generating function) 


e%-), and X, bea gamma variable with c.f. (characteristic function) (1— it[a)-^.'T'hen the p.g.f. of X, AX 
is given by 


pounding operation. In such 


x -À T 
s T gnie-De-az pc dg = [i- 26-2] (c7 0, à 0, A» 0). (3) 
0 


If we consider a Pascal variable with p.g.f. 


[1-p(2—1)-* (p>0, 120), (3) 


it is evident that for each set of values c, æ, A in (2) aset of values P, k can be assigned in (3) such a 
expressions (2) and (3) have the same value whatever be z. Also, to any set of values p, k in (3) : 
correspond values c, a, A (not unique here) such that (2) and (3) have the same value whatever DERE 
We shall call two such distributions equivalent and now give the following formal definition. 


Definition 2. Equivalent distributions 


Suppose the random variables X,, X, have distribution functions Ey(a|a), Fea |£) —— r 
(z and/or f may be multidimensional.) If for each « there exists some / and for each P there exists sc 


- - : iv: t. 
& such that F (x | «) = F,(x | B) whatever be x, the random variables X, and X, are said to be equivalen 
and we write X, ~ X,. 


Occasionally it is convenien 


t to represent a random variable by the name of its corresponding dis- 
tribution. Thus, in the case of 


Poisson, gamma and Pascal distributions the meaning of the relation 


4 
Poisson A gamma ~ Pascal (4) 
should be clear, 


The following definition of a 


" ai er- 
generalized distribution is given formally in terms of probability gene 
ating functions because of grea 


ter generality and ease of manipulation. 
zed distribution i 
- - 7 Y 
Let the random variables Xy X, have p.g.f.’s g,(z), go(z) respectively. Denote by X, v X, the Goon 
variable with p.g.f. N(9o(2)). Then X, v X, is called a generalized X, variable with respect to the ‘ge’ 
alizer’ X,. 
It may be remarked here that X. 


(1943). For X, the p.g.f. is merel 
The followin, 


tributions. 


Definition 3. Generali. 


1: X, need not be discrete variab 
y taken as Ez^:, and likewise, of 
g theorem gives a simple relati 


n ler 
les as assumed, for instance, by Fel 
course, for X,. -€— 
on for a certain class of compound and generalize: 


THEOREM. Let X, be 


a random variable with p.g.f. [A(z)]? 
0 is regarded as a rando. 


now 
» where 0 is a given parameter. Support Xs 
m variable Xo, say, with distribution function F, and p.g.f. gs. Then, whatever be 2 


X;A X, X. X. 


Proof. The p.g.f. of XA X; is given by 


o 
J [h(z)]** d Ps(a) 
-o 
and that of X, v X, is given by 


gs(gs()) =f U(z)]?* dF (ax). 


It is obvious, of course, that these D-g.f.'s exist, at least, for z = eit, 
theorem is proved. 
Some examples illustrating the above theorem are the following: 


the 
and are equal when ¢ = 0; hence 


i A A » 5) 
Poisson A Poisson ~ Poisson v Poisson, ( 


6. 
Pascal (kX, p) A Poisson ~ Poisson v Pascal, ( ) 
k 


7 
Pascal (k, p) A gamma ~ gamma v Pascal. qm 
k 
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tis, of cor rent f wi 
Itis, irse, apparent fre } i 
is saw NESE; ADE om the theorem that two diffe ical interpretations may under 
: ees à t ifferent mathematica’ ‘pre i y ie 
distribution ADR 'Thus, as already pointed out by Feller (1943) Cip irm = eA seit : 
spit sth may be interpreted as a compound Poisson or a ge li: i D Ms 
mized p Me ose, generalized Poisson distribution. This is 
elation is also i resti i oi member is 
(7) is also interesting in that the gamma variable appearing in the right-hand bi 
R ~] n. i 


à conti 
ntinuous random variable. 


3. EXTENSION OF A CERTAIN RELATION 


As mentioned in $1 it i 
ioned in § 1 it is known that the following equivalence holds: 
- Poisson A gamma ~ Poisson v logarithmic. (8) 
Immediate extension of (8) is the following: 
Pascal (k, p) A gamma ~ Pascal v logarithmic. (9) 
2); let X, be a log- 


To see this E gu 
© this, let X, (i = 1, 2) be Pascal variables with p.g.f.’s [1 p(z- 1)]75 G = 1, 
— itjf)-^. The 


ürithmie vari i 
at dps with p.g.f. —wlog(1—7z) and X, be a gamma variable with c.f. (1 
2), say, of X (Pi, 4) A X, is given by 


" pr fo 
we) = Ay I pple Dt tinh 


c c Di A 
= | 1+—log(1+7 TZlogli-—-—z f 
[ lot ep) vs (1 dm )] 


On the 
ne other hand, the p.g.f. A(z). say- of X, v X; is given by 
h(z) = [1- Pa &Ps log(1—72)]^. 


~X,V Xs is immediately apparent. 

s (8) by letting the gamma and logarithmic 
This extension could be carried through any 
dom variables obtained in this manner as 


and on 
vi Iequired equivalence X,AXy 
iables E. possible to extend (9) in the same manner à 
inito tmb x ; the same role as in the previous extension. 
nth Stage or n, say, of stages. One might refer to the new ran 1 
compound Poisson and nth stage generalized Poisson respectively. Insuch a context the Pascal 

ge compound Poisson or generalized Poisson distribution; 


distri 
Tibuti 
ae ee could be referred to as a first stag 9 
S the variable Pascal (k, p) A gamma or Pascal v logarithmic could be referred to as second-stage 
k 
, bean ith stage compound P. 


Ver: 
E es of this type. 

9 sketch the method of proof for the general case let Y; ; 
ble. Let the corresponding compounder and generalizer be 


Dh st 

Sam, cg? generalized Poisson varia 
R i À;) and logarithmic (7;) respectively. . 

“peated application of the compounding operation in (2) shows that the p.g-f. 


6 - 
of Y, is [1-2-0] d 


1 
-Às 
Ca [TA : 

of Y, is [oet o] 


č jog (1 -ag-n)]] ^ 


€; 
of Y, is [ge 


var 


oisson and W; be 


gf. efe-) and write the p.g.f. of 


To wii etc. 
logaritn® th p.g.f. of W; let the original Poisson variable m b 3 
Then Te (7;) as — a, log (1— 7,2), Where a,log(1 70 = 7 («7i 
e p.g.f. B 
- of W, is e-%(1—T12) ab, = 
of Wais el 4 yas log 1722] : p 
of W, is ep eas logo ene 32) 
Sin, 
ce ete. - aan diea d 
X. = -a0, say (m2 0): 
: "n "« fs of y,and W, are of the same form and hence yield 
_, the p.g £805 7 * 


ig el 
Cop; ear 
‘Wivate, that for each finite k = 1,25 -- 
random variables. 


268 Miscellanea 


4. SUMMARY 


A simple relation between a certain class of compound and generalized distributions is pointed io 
and a few examples are given. In particular, some distributions may be regarded both as compound 
and as generalized. Finally, a relation between the Poisson, gamma and logarithmic distributions : 
extended, which involves the Pascal distribution, and a generalization of this extension is indicated. 
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A note on tests of significance for linear functional relationships. 


By M. S. BARTLETT 
University of Manchester 


Williams (1955) has recently developed further the exact significance tests for a single hypothetical 


non-null canonical variate, first proposed by him (1952) in special cases and considered further by the 
author (1951). He has also, however, advocated analogous tests for hypothetical linear relations or nul 
variates in situations where these tests are not correct, and the purpose of the present note is to draw 
attention to this. 


It seems convenient to summarize these tests in terms of the factorizations discussed in my 1951 


paper. For simplicity the case of one null variate will be considered, in contrast with one non-nu 
variate (the more general case bein, 


x g r null variates, in contrast with r non-null variates). It was pointed 
out in my earlier paper that if the regression relation of a vector variate x with p components Vi 


(¢ = 1, ..., p) on another variate y with q components (p <q) gives the p canonical roots Ri(jol. ao Ph 
so that a a) 
A(n, p,q) = Ia —R, 


and if a hypothetical linear combination £ of the x; gives the corresponding measure of association (more 
strictly, disassociation) 1 — $, then the exact criterion 


A'(n—1,p—1,q) = A[(1—) 


2 
AS CET agg, e 
1-6 jn 


where the two factors in (2) may be tested approximately. In 
in (2), ‘exact’ factorizations are possible 


may be factorized further into 


place of the ‘approximate’ factorization 


i’ = A"(n— 1, p— 1,1) A"(n—2,p —1,9— 1), (3) 

A’ = AT(n—1,p—1,9—1) A*(n—q,p — 1,1), 

in terms of the projection of £ in the sample space of the q variates y, (k = lys): "T 
Consider now the case of one hypothetical null variate 7 (in relation to the set y), and let the corre 


sponding measure of association of this with the other set y be 1 — A. Williams suggests that 7 will define 
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P- l variates ortho, i 
n S gonal to 7 in the sampl y. i i i 
(ieee ge mas diem y te ple space, Z, say, and that if the relation of z with y is measured 
A"(n—p--1,1,g) = A( p. q)/A'(n, p — 1,9). (4) 
(in the reduced space orthogonal to z) of the projection , say ofz 


If, moreover, the relation with x 
— 1), then the further factorization of A" in (4) is possible, 


on y is given by A"(n—p- l.l, p 
A" =A"(n—ptl, 1,p—1) A*(n—2p4-2, 1, q— p 1. 
y being taken first) 


(5) 


wit A 5 j 

ith an alternative factorization (the remaining variables in 
AT ss AT(n—p4 1, 1,47p41) A*(n—9 lp- 1). (6) 

ed. It is not immediately obvious that the total 

the variates in z are already 

tochastic 


ear 
—1,q) in (4) is still n, as 
orientated with respect to y, the si 


ie) i ; ; 
int possible minor point of obscurity may be el 
er of degrees of freedom in the criterion A'(n, p 


orth 
ogonal to 7. However, as by hypothesis 77 is randomly 


Part of z wi z . 
z will be randomly orientated with respect to y, and there is no reduction in n. Nevertheless, this 
e, forin general, of course, zis not random with respect 


er indi 
T ^r indicates the sort of difficulty that will aris 
» and under such non-null conditions for Z the relations of the variates are presumably more 


complicated. 
ve to consider the more 


but first it is instructi 


ed hes is so may be demonstrated in à simple case, 

ea ctorization corresponding to the factorization in (2) for testing £. The simple factor 1— À is of 
ype A(n, 1, q) and is expected to be insignificant. Moreover, wo may write 
1-4 í 
-À= = RS} 

1-/ Fea Rp) (7) 
te the existence of no null variate, and the significance of the 
hile the overall test based on 1—A, 


Whe er e : 
re the significance of Rp would indica 
ht null-variate. Unfortunately, w! 


ot) 

ed factor that 7 is not the rig v 
fm available, is exact (on the usual normality assumptions), the two separat: t 7 
Ls ye into approximate X han in (2), for the crudest y? testing of R? would rely 
as all the remaining p — 1 variates Z having close relation with y, and this need not be the case. However, 
Pt nal insignificance of some o i ould be evidence of more than one 
vel variato (ef. tho discussion by eere 
E Has approximations may E 
pa re difonlty with any exact a 
Variate pothetical canonical varia 

is not 5), the choice of p— 1 variates ortho 
t VE s aig to this. — lition p <q were dropped and the case g = 1, say, considered, 
Ma BER. Wie peines that if in D M condit Ta contrast, Williams gives an analysis of variance table 
9p of i ie ores z i ted not against & residual sum of squares 1-A, 
ut P. 375) in which the sum of squares A for 77 is tes lysis of y and one, moreover 
against the residual 1— R°, a quantity quite irrelevant to tho anam v iation with y 
, y iR by increasing the irrelevant essoole ion with y 
wishe i i point out, even moro 


Which 
of could be made as near zero as We 
is the hypo- 


s 1 other canonical variates. A very special case ©, and ty 
t UM that of p = 2, when there are Boc isa 
null variate. While it might be 0 as measure 


8 No inar: 
With eect for inflating the straigh 


te factors in (7) are rather 


ns in (3) is that, while we may eliminate 
st the goodness of fit of these 


nalogues of the factorizatio 
when only 7 is given a priori, 


tes if these are given @ priori (and so te: 
onal to 7) in the sample space, 


two dep 
f interest to KT 
tforward signific: 


Clear 


that he fully agrees with my criticisms, and he notes 


illiams 
h Dr Willia T amendment. 


Tle, 
that een from correspondence witl 
ill consequently req 


t 
S$ VII and VIII of his paper v 
s from 
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The moments of the Leipnik distribution 


By M. G. KENDALL 
Research Techniques Unit, London School of Economics 
1. The distribution 


(1—72)i"-idr 
Bin + 4,4} (1 +p? — 2pr)i^ 


(-1&r«1) (1) 


was obtained by Leipnik (1947), following a method due to Madow (1945), as an approximation to the 
distribution of the first serial correlation, circularly defined, with known mean, in samples of n from & 
Markoff scheme. Recent work by Daniels (1956) and Jenkins (1956) show that approximations to the 
distributions of the coefficient with fitted mean, and non-circularly defined, have moments which are 
simply expressible in terms of the moments of (1). 

Leipnik himself found the mean and second moment of the distribution by a complicated procedure. 
Jenkins (1956) has recently obtained two more by a method which is also rather complicated. In this 
note I give a general expression which enables moments of all orders to be written down. 


2. We have 


_ lem, =f p=) (1 —r72)n-tar 
nip  J.iBün 5 D p prir 
k-1 a e: s 
={ DIT ap- [4004p Sept) 
2p(1 + p* — 2pr) 
=f ODD eap Meng p NES A 
2 1-4 p*—2pr 2p J1+p*?—2pr 2p 


rkdk 


Mi, 


1+p* [(p—r)r* 


l , 
mE m dr. 


p 2p J 1+p?—2pr ' 
whence dom Ax zu LPPE e id (2) 
pte a 2pn apts Mim 


3. This relation can be used to build up the moments from zo (= 1) and p, (= np/(n+2)). But we 


can also use it to derive explicit expressions. We note in the first place that if P, is a polynomial inp? 
degree & the solution of 


is a polynomial of order k+ 1. It follows that A is a polynomial in p of order k. Moreover, it follows from 


(2) and the lower values of p; that even-order moments contain only even powers of p and odd-ort er 
moments only odd powers of p. a 


Let 3 k 
Mk= My agp. 
m=0 


On differentiating (2) m times and putting p = 0 we find 


~ n+2m 


Akm 


Um 1) arim n m=) ays ma) (3) 


4. We have jt, = 1, do = 1, and hence 


1 
Lg n42^ 
á np 
giving Kor EE (#) 
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By successi i 
y Successive application of (3) we find 


(cuui uq, BABS 
ew (n2) (n 4)* (5) 
_ 3np on(n Dp 
(2-2) (04-4) ^ (4-4) (n 3-6)" (6) 
Kasa 3 Gn(n- 1) p? n(n 4 1) (0-3) p* 
XUL LA E uere d 
(1-2) (n4-4) (n4 2) (n 4-4) (4-6) (n 4-4) (n 4- 6) (4-8) "m 
D Pee cis l5np 1O0n(n + 1) p? n(n.4- 1) (n 4-3) p* 
£e dnd + p 
(4-2) (n 4-4) (n 4-6) ^ (M+) Q1 4-6) Q1 4 8) (n4- 6) (n 4-8) (n+ 10)" ii 
i= 15 45n(n+ 1) p? lón(n + 1) (7 5 
: E m(n--l1)p* — " J n+ 3)p 
(+2) (nm +4) (n4-6) (+2) (0 Imt m+s (n +4) (n 4- 6) (n 4- 8) (n 4- 10) 
n(n-4- 1) (+3) (+5) p> — (9) 


do . 
For (n+ 6) (n 4- 8) (n+ 10) (n 4-12) 
mulae (6 = " "7 
ulae (6) and (7) agree with Jenkins's results. The general law of formation of the terms will now be 


clear, T ee 
- The coetlicients may be set out as follows: 


Coefficient of powers 


Order of fe ee 
momen Q 1 2 3 4 5 6 7 8 9 1 

1 f i 

: 1 ‘ 1 à 

3 s ; i 

E 3 : 6 ; 1 

2 15 à 10 3 1 

i. 15 45 ; 15 1 

1 105 105 EE 1 

3 mw ' 420 s 210 28 . 1 

2 945 . 1260 . 878 . 36 sod oc 

10 — 945 4725 . . 8150 630 . 45 1 

E ..—, The results are easily demonstrated by induction 


Th 
e numeri 
nerical eoeffici 9 iene 
al coefficient Of a; 1S mi -mn EN 


from (3). 
ating function M ' obeys the relation 


5. Fr 
rom (2) we find that the moment gener 


18 1) - (aegra n 
19, -|——-7l 25 2 

P ( ap 9p 00 e op 

an np 

x M(0) = exP (5) AM(0). 


ng coefficients in O* we find 


We o N ; 
tain the m.g.f. of the mean moments. Identify or 
sp ak, oe o 
are p uius eee (10) 
Sin, 9p 2p ERA 2p(n+2 q 
ceu 
Mo = 1, ji, = 0 we find 
2 11 
übt ess nn- 1-p* (11) 
ni2 (n42)0(n40 ” . 
— 6p(1— 0?) ‘ 
6 2n(n-- 2) 8n 2^... zem, (12) 
Hym A V, in + 8) n? 
(n+ 2)? (n+4) m+ (nt 4) (n+ Ms 9) (n?— Ldn? 1n P 3(1—p?)? pm 
" 2 n(n — es, 
Ms aa qt m AE er gj (n4) (n 9 nts) n? 
[ p aesa n s 4 n+ T1 " " ze 
i Thing. m+n) (nt 2P 019 l - not appear that general formulae are obtainable with 
? si also agree with Jenkins’s results. It does not 8I 


"Implic: à 
Plicity of the moments about zero. 
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6. The limit of (10) with n large may be written 


[273 2 
Ree = (1p) FE — okat kp) s (14) 
Thus, with k = 4,5, we have m 60p(1— p°)? (18) 
Y n? 3 
18(1— p?) (16) 
so m C 


Again the general law is clear. In standard measure the odd-order moments tend to zero, the even- 
order moments to those of the normal distribution; and thus the distribution tends to normality. 


7. It may be of interest to record another recurrence relation between the moments, namely, 
, n+l ‘ 2 7 
a) = y(n) +3 0pm 2)-(1 p?) us (n 2), Qm 
where (n) refers to the moment about the origin based on a sample of n. 
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The effect of transformations of variables upon their correlation coefficients 


By M. H. QUENOUILLE 
Research Techniques Unit, London School of Economics 


; -— : M rived 
Suppose z, to be normally distributed with zero mean and standard deviation c, and let y, be deriv 
from x, by a monotonic transformation 


where H,(z) is the ith Hermite 


polynomial. Let the correlation between a, and Lis be pse 
The correlation between y, 


and ¥,_, may then be derived using the relations 


TEOR 


=ilpt (i=j). 
These little-known relationships spring from the more-familar equations 


T x, eae 
CORONES 
—49 (iz) 
by transforming the variables to independent normal functions, v, 


Psi 4- 3j, and using 


j pe 
* Hj 4c) = b (ô cH ;_.(y). 


co 
It follows that COV (Yis Ys) = = agi! pi 
i= 


ai pst 2a} p? + 6a2 p3 +... 
ai 2a$ + 6a3 +... 


and Deo Yis) 


——— am 
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.. and, except when p, = 1, is necessarily 


Co; |f r, 1 

nsequently, p(y, yrs) is a weighted average of ps, p?, pz 
st 

will be greater for negative p, than for 


less th 
han p,. Furthe i ^ 
poitin ^ her, the difference between p(ue 34.) and p, 
we that, in any time serie: 
correlati : 
tións oe and that failure to achieve normality 
This us larger in modulus than the negative serial coi 
atiplits nd fact will be most obvious when the serial c 
ide b. In this instance, the average of p, over a run o 


b(n— 1)(n—3)... [£n(n—3) +} 


un of values of s is 


s. transformation of the observations to normality maximizes the 
may be marked by the positive serial correla- 


rrelations. 
orrelations contain a harmonic component of 


f values of s is zero for odd n, and 


fore " 
ven n. Thus the average of p(y, Yr-s) over a T 
aj 0 3*9 aj (1.3.8) apt 
a? + 2a5 *- Gaz + 24aj +... à 
tance, for the transformation y, 


This express; 
a, = E eee may be fairly appreciable. For ins = exp (a,/7) — exp 0*5. 
/i!, and this expression becomes 


2 


g^, 1 [o*', 1 g^* 
A +an(3) ma * 


more rapidly in most in 
0-155b, an appreciable 


b 


derived from Iq(0?). or, stances, by 
shift in the 


Th 

e nume j 

i merator of this expression may be 
ession is equal to 


weet c; P 
mea, calculation. For example, for = 1, this expr 
n value, 


rmation of the correlation coefficient 


Further properties of an angular transfo 
B. I. HARLEY 


London 


Br 
University College, 


T. n 
Ina previous paper ( 1956) the properties of the transformation 

y= sin-!r. 
ation with 


n-ir then 
a) 
but by considering the form of the 


d to that given in the previous 
n of 7/(1 — 3) may be 


] bivariate popul 


from a norma A à 
e expectation of sir 


le of size n 
-1y) denotes th 


is the correlation coefficient in à sample o! "P (si 
On p, were discussed and it was shown that if „(sin 


Where i 


Corr, 
elati 
d (sin! r) = sin! p 

7 for even values of n, 
yel s metho 


$55 
Xall y 
alues of n, This was proved direct 


ernative I 


Striby, c 

tion of 2 + directly an alt ons 

"per, o F 7/(1—7)* we can obtain directly that the distributio 

Igala d e | gieee values Gf. Es. —Á ae investigated in another paper (Harley, 
and t 


1957), © that of the non-central ¢ distribution, 
bivariate population 
an values of 


values, 


ormal 
ulation the me 
ns of the n sample 


2. 3 
Wit) be Vi) for i = 1,2,..., n, are n pairs of values drawn randomly re pop 
°rrelation P: Without 1688 of generality we can arrange s re the mea 
the, Y are zero, and the standard deviations are unity. If? 9? ys 
f the sample i 


ee lo; y 
oeffieient of correlation © 


s given y 


tan, 
Siin 
m 
Ing to the variables 


Biom. 44 


We 5 
fing 
that " 
8 xe+7 
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and (X;, Y;) may be considered as n pairs of values drawn randomly from a normal bivariate pee 
with zero means, unit standard deviations and correlation zero. Writing X, Y for the sample mean 
X, Y and S%, S} for their variances, it follows from (2) that 


Y (9-2) (9-9) 
r m 
al = ri n n 


n 2)1 
È (,- 2? Y (9 — ( (z,— 2) (3) | 
i1 i=1 122 o, 


ap È (X-X) Yi P +p X Y-F 
i= i= 


n n — 72): 
a-i [3 ax, - xy X q-re- Xc-30-»]| 
i-1 i= i= 
n 


3 (X,- X)(Y;,- Y) 


i=1 


n Rem. = n "em — TW 
(È -2r $ w,- Pt- $ -mo.- » j) 


e) -4 
(5 w- Py)! 5 a-3m-n] 
+ _ i-1 " j- Lie x 
- n ul cm ya 
(1—p?) ( Quy * (X;— Xy 3 Y- Y) 
i=1 i=l i=1 


SES ES P SH 1 
(=i =p ss) ri 


. . . T tion 
where 7, is the correlation coefficient of a sample drawn randomly from a normal bivariate popula 
with zero correlation, 


"s S S 
Let ari ^ and g^ 
then TE 04) 
and 


m 
ü-mi^ to + PM -- i) tan (sin-! p). 


Let sn1p=0 and Fitan0 = 2; 
3) 
then — S ( 
(ug tet e. 
8. Jf E, = &,(sin-1r), 
then Bh 6 (m) 
= &, (tan [t +2(1 tl. 
OE, — Fè sec? 9 


é, . 
t» 20 "(0 +24) (148)! + 21,2) à 
2 " ca 
F and t, are independent for the case we are considering, and thus to derive the expectation we 
integrate first with respect to ż and th 


en with respect to F. 
The distribution of ¢, is known to be 


1 


ee a 
B(4n-1,3) (1 Tini 
We wish to show that E, = sin p = 0. If n = 3 


P(to) = 


1 1 
M e prs) 
su) marg PO = ule 
ôB, _ E zi E dto F)dF 
Henge 6 Jo m -»Gx5nxprez asa [^ pe 
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Let t, = si wi 
S H i 
o inh ¢; then the integral with respect to to becomes 


we cosh ódó 


12 [75 
Le {(1 +2?) cosh ø + 2z sinh ¢}cosh? ó 


+2 — sech*ódó 


% I. 1+2%+2ztanh¢ 


1 


1 


2 


2 


Hence D f 


Inte; i 
grating by parts we find that 2E,/20 


P = 0 we have 
+1 
m-f 
-1 


1 
x 


Thus G = 
C = 0 and, in general, E, = sin"! p. 
deduced the foll 


In a previous paper (1956) we 
got 

ex =p 2p 
= 3, we have just seen that 


En- 


Cony 
equently the left-hand side of e 


Substitus; 
ituting (n — 1) = 5,7,9,... in (4) in turn, 
= dy (sinr) = 6,(sin*") 


in^! 
sin-! p 


Wen 
ave shown previously that 


coe; en ees een 
Hence sinp = 6,(sin DES 
E,(sin7) = Sin" 


Hanrzv, B. I. (1 
HARLEY, B. 


Heterogeneity of err 


By FRANKLIN 
Oklahor 


na 
random; 
re eq, zed block experiment v 


Squay, 


ð , 
— 5. (DL 13) = 1- 


quaj , 
e n When we have heterogeneit 
the error mean square is nO ibuted as Snedeco 


li a +o 
22 "DI 


-0 


m +F? tan 0) 
(1— Fitan0)? 


tanh-! (F? tan 8). 


© 2sec?0 
o Tian tanh- (Fl tan 0) (1-- F)*dF. 


= 1. Thus £, = sin? pte, where C is a constant. When 


sincir 
x magii e 


cci Lr. 
d^. sinc1z(1—72)73dr = 0. 
in-!r)?]* bs 1p(1—72)-2 d 


zu. 
owing recurrence relationship: 


P^ (n= 2)[8q-s(6in 27) —Enexlsin 0) (4) 


& (sinr) = sin! p. 


(sinr) = 
ows that 


quation (4) is zero and it foll 


(sin) = 6,(sin7) = sin-! p. 


we see that 
fi (ain tn) as 


& linr) = gnr) = 
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or variances ina randomized block design 
axp JOHN LERO 


d M echanical College 


A. GRAYBILL y FOLKS 
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1. IxTRODUCTION 
s that all the treatment means 


the hypothesi 
ratio of the treatment mean 


however. the 
tb distri r’s F. Fo example, consider & randomized 
18-2 
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block design with p treatments occurring on each of r blocks. If each of the first n, treatments have 
variance g} and each of the next n, treatments have variance g4, etc., and if 


g 
È n =P, 
i=1 
the mathematical model may be written as 
Yin = Mtb top teu, È= L2, g9] = 12,5 b= 1,2, 0.57), (1:1) 


where the e;;;'s are assumed to be normally distributed such that 
4(e;j4) = 0 for all i, j, and k; (ej) = o? for all j and k 
and E(Ciikesmn) =0 unless i= s, j= m and k — n. 


Graybill (1954) has given an exact test, based on Hotelling's T?, for testing the hypothesis, 


Hoi ty =i... = lin m p mem bong: 

This test is valid if r>p—1 and requires inverting a matrix of order p-—1. : i 
The purpose of this paper is to give an exact test of H, which is simpler than that given by Graybi 

when n;> 1 for atleast one i. Ifn, = n, = ... = n, = 1, then the test now proposed reduces to that given 

by Graybill. The latter is valid when r >p-— regardless of the values of n;, but the new method is valid 

if r2 g— 1 and requires the inversion of a matrix of order 9 — 1. Therefore, if n;7 1 for at least one ?, 

the method proposed in this paper is considerably simpler than the earlier method. 


2. THE TEST CRITERION 


Using the ith subset of observations* let us conduct an analysis of variance as in Table 1 for each subset 
for which n,» 1. 


Table 1. Analysis of variance for i-th subset 


Source of Degrees of i * 
variation freedom um of squares 

sities ae mS yi sy. = 4; 
A 

rabiners med TX (Yu. =Y.) = Bi 
j 

iia (r—1) (n1) D (Fir Yin ya. t yu? = Ci | 

jk 


The ratio = fey = F, (where Y;;, indicates Y,;, summed over k and yi, indicates ube 
i 

average of Y ;;, when summed over k, etc.) is distributed as Snedecor's P with (n; — 1) and (r — 1) (*;— n 
degrees of freedom if and only if the hypothesis H;: (ta = tig = ... = tin,) is true. is 

If we let g — 1 represent the exact number of subsets of Y ixin which m> 1, then we will run an analys" 
of variance as in Table 1 for each of these subsets. Also since Y ij, is independent of Y mn ifs i (regardless 
of the values of j, k, m and n), these q— 1 analyses of variance will yield q—1 independent 7^; each dis 
tributed as F with the appropriate degrees of freedom if H, (i = 1,2,...,q— 1) is true, For definiteness le 
us assume that the first q— 1 subsets each have n; 1. That is to Say, suppose n; > 1, 72> l, ..-,?*a-1 iid 
This is no way affects the generality of the discussion and the result is that if the hypotheses 


Ay: (ty = te = +» = bn); Hy (to = top = 1. = ton,); west H 


are each true, then F,, F,,..., F, , are independently distributed as F with degrees of freedom 


) 


aia = oa = ++ = faa 


(m,—1) (r—1)(m—1; (—1)(r-1)(n5—1)5 ..., (1,4 — 1), (r— 1) (n4.4— 1) 
respectively. 


* By the ith subset of observations we will mean all observations where the first subscript is equa! to ts 
For example the second subset of observations consists of Y,;, where j = 1, 2,..., ng b 2 1,2, 5 
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If we 
average the model (1-1) over the subscript j we get 
ini 


ui Y ijk ni ty €. 
à "m =u+ D404 DA. 
E CE dems jm (21) 


If wi ni 
€ denote $; Y ;,/n, by By, and let T; = 


A E talni and dj, = Xeno then we can write (2-1) as 
j 


From the assumptions in (1:1) soli cnn am 
éld) = EL eif; = 0, élda) = Ti [Mni 
j 


DNA £(d;d;,) = 0 for js: £(d;d,j) = 0 for iv. 
TOME ; the model considered by G zbil! i 

dee ee i 4 y Graybi 1 (1954) and we can use Hotelling's T°? 7 1 

2 =... = T, The procedure is to define the g x 1 column vector X; be sclera sini, 

By ~By 

X;= By - By 

and defin By-1,3— Bos 

e the g x 1 vector X = X, X;/r. Then if H, is true, the quantity 


j=1 
? r(r-g Ux 2 wr xy 4 
m= max ga- DA-I 
J 


s of freedom üfr>g- 1) 
d as F if H, is true. 


is distri 
ributed as F with r—1 and r—g+ 1 degree 
y independently distribute 
i ach functions of 


We ` 
Tiet ja now show that F}, Fa ..., Fa are jointl 
Vij,— Yi.) = u;; and (Jus — Mix Yat Yin) = vyp Since Fa, Fos sss Fa-1 are © 
i and since tij "ij and By, are all normally distributed, it 
) = cov (Bus: vij) = 0 for all 


Uy 

; and v, 1 

f ijj, and since F, is a function of Bmw 
ly indepen 


ollow; 
. Ows that F, F. eed 
53. k, m and Ton sa Fai FQ are joint 

et us consider the case when m = d: 


Iti 
is obvi 
bvious that cov (Bnn ti 
cov (Bins Mis 


dent if cov (Bmn Mis 


)=0 for mx i. Li 
= Elei. n) (6.7 


Also į rhj j 
it igi 
is immediately clear that cov (Bas tur) = 0, for m t? Let us consider the case when m — i: 
cov (Bin Vise) = Elei.) (eg — eis. tikt ei) 
3 Chik 
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~ piu m N 
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re, that the F; (i = 1, 2,..0) E J Ds a tyn We can test Ho, therefore, 

aa A 


anq " 
nly i Y 
b Y if, Hy is true; i.e. if, and only if; 41 = te 
shods of combining independent tests 


Combin: 
nar ining q independent tests of significance- 
f sig e Cent paper Birnbaum (1954) has summariz rious met ; 
91-1). If Fisher's test is used, then the procedure 
te the significance level 


SI P 

to, BRificance includi i 

esti including Fisher's test (Fisher, 1946, $214). 
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An extension property of a class of balanced incomplete block designs 


By G. P. SILLITTO 
Imperial Chemical Industries Ltd, Metals Division 


Let a balanced incomplete block design which enables v varieties or treatments to be compared in b 
blocks (each block accommodating k varieties, each variety occurring once in each of r blocks, and each 
pair of varieties occurring together in A of the blocks) be represented by a matrix having v rows and b 
columns, in which the element in the ith row and jth column is 4- 1 if the ith variety is tested in the jth 
block, and — 1 if this variety is not tested in the jth block. Then it is not difficult to show that the rows 
of the matrix are orthogonal to each other if, and only if, 


b= 4(r—2A). (1) 


This note is concerned with balanced incomplete block designs for which (1) holds, and in particular 


with pointing out a ‘law of composition’ which enables a third design to be obtained by a simple process 
from any two such designs. 


Using the known relations for any balanced incomplete block design 


rv = bk, (2) 
A =r(k—-1)/(v—1), (3) 
a little algebra shows that if (1) is to be true, then necessarily 
ae» (4) 
2k = v £v, 


so that (1) can only hold if v is the square of an integer. If (1) is true, the two values for k given by (4) 
are the values for a design and its complement. However, (4) serves to determine only the ratios bi i 
and A/r, through (2) and (3) respectively. Hence what may be called the ‘simple multiples’ of à em 
(v, b, k, r, A), i.e. the spectrum of designs (v, nb, k, nr, nA) in which n is an integer or any fraction whien 
makes nb, nr and na all integral, are covered by the same conditions as have been discussed. The value 
of this observation arises from the fact that when a design allowed by (2) and (3) does not exist, OF m 
difficult to find, some of its simple multiples often exist and can easily be found. 
Designs conforming to (1), (2) and (3) are of the form 


v=w, b=2uN, k-dwutl) r-(utl)N, A= d(w+2)N, (5) 
where w is an integer >1; N is an integer > ju such that b, r and A are integers; and designs with the 
parameters corresponding to + signs represent the complements of the designs with parameters corre- 
sponding to — signs. From any of these designs which exist and can be found, other designs can be 
obtained by using the law of composition given in the next paragraph. . 

From any pair of known designs for which (1) is true there can be found a third design, by the following 
process. Call the matrices of the two known designs M, and M,; then the direct product M,x M: ° 
these two matrices is obtained by substituting for each element +1 of M. ı the matrix M, and for each 
element — 1 of M, the matrix —M,. Then the matrix M4 = M, x M; has v, = v, v rows and b, = by bs 
columns, and all its elements are +1 or — 1. A little consideration shows that all ihe rows of Ms contain 
the same number r = r,r, + (b, —7,) (b,—73) of +1 elements, and all the columns contain the S890 
number k, = kika + (v, — kı) (v4 — k;) of +1 elements. Further consideration of the different kinds ° 
row which occur in M, shows that in any pair of rows, the number of columns in which both rows hav® 
+1 elements is Ay = 6r,7,—87,A,— 87,4, + 12A, Àa. Hence M, is the matrix of a balanced incomplete 
block design with parameters vy, ba, kg; Ta» Ag: 

Tt is easily found also that M, conserves the property of row-orthogonality, and hence it in turn. e 
be used as a factor in forming the direct product matrix representing yet another design; and 80 on. 
Clearly also in the direct product M, x Mg, M, can be identical with, or the complement of, M. if desired- 
Therefore, if one or more of the designs in the series (5) are known, an infinite number of balanced 
incomplete block designs ean be constructed. 


The first members of the series (5) are 


ú N v b k r À Notes 

2 1 4 4 3 3 2 Known, unreduced design 

3 2 9 12 3 4 1 Known, see Fisher & Yates (1953) 
4 2 16 16 6 6 2 Known, see below 

5 4 25 40 10 16 6 Not known, but see below 

6 3 36 36 15 15 6 Known, see below 
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being given by Fisher & Yates (1953). Such a design 
=? N= ith i 
N = 1 with its own complement by the 


=2,2 


A design wi 
ras iw. n with u = 4 and N = 2 is well known 
so be obtained by composition of the design for u 


py eerie method. 
€ T] - R 
design T mye R to u = 5, N = 4 appears to be known, but the 3 simple multiple of it, i 
Bose, 1939) RR tae 6, v = 25, b = 60, k = 10, 7 = 24, À = 9, is obtained (using the inalo of 
oresoon yeon idering the modul of residue classes (mod 5) and letting five varieties t, Us t. a 4 
o any element u of the modul. Then the design is obtained by adding the b cutn 
the suffixes invariant: t 


mod . 
ul to the following twelve initial blocks, keeping 
(0,. li; 2, 31, 825 2 dy la da 05) 


(0,, 1,. 2 
(0,. 44. 0. (ys Og, 1o, 255 Bay 3 20 Sar lo 4,) 
(15. 41 Oa Oss Lay 2s» Bg, 35 25 45) 


(lis Oo. ta ` 3 

Gy Tay Ons da 0p 1 20 Bay 35 45) (2), dy los dos Oa Ds Lay 2a 3 35) 

(Bry 44 29 1g Oss Sar Osr lo p (31, 2e 4o. las 4a» Os» Os 1,95. 85) 

(Oy. 14. Ogs Los Oss Lay Or La Osr Is) (04, 245 Oar 25. Os 25. Os» Zar Oss 2s) 

N = 3 appears to have been published, but one can be 
d letting four varieties wj, Ua» t» Uy 

d by adding the elements of the m 


obtained by 
correspond 


N " 
o design corresponding to « = 6, 
odul to the 


Consider; 
to inr ring the modul of residue cla: 
follow element u of the modul. Then 
ng four initial blocks, keeping the sw 
(00,, 02,, 10), 121+ 20, 22. 015, 125 124, 214) 
(00! 125 214, 005, 02,, 105, 124» 20m 220 Obs 125, 20,, 00,, Ola 024) 
2 


(00,, Oli, 024, 005. 125, 21s. 005, 02,, 10g, 125, 20s» 22s» Ola 12,, 20,) 
(01,, 121, 201, 002» 204, 224) 


Ola, 02;, 003; 125, 215, 00, 02,, 104. 124 
made by the referee which have been incorporated 
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in many special cases the sequenti 

| ation, as compare 
imple manner, 


l 
D A n 
umber of authors (see Anscom 
hown that 
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cad to any marked advantage in problem! 
snote to demonst 
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f a confidence interval 
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moderately simple conditio -— ruction d 
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Teter yee that a sequential procedure leading to th 

(i) ihe the following properties: m P 
probability that it ter minates after n wen Fm fidence ir 


ii) th 

E sie itv of obtaining à 

pl aay ional probability of obtaining ® 7 A 

à oe terminates after n stages, 18 Ow and Wn 7$ independent of 0. 
re as & procedure such tha 


e 
fine an optimum procedu 
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terval statement, given that the 
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overall confidence coefficie 

(b) 2 i inimum 

Subjec X nP, 189 minur . 
t to le number n i s 

m TAN (a) the average samp n21, ay general con itions effectively optimum pro- 

dures = theorems which follow show that un db if certai further conditions are satisfied, either 

(a eed have at most two non-zero © n s, and tha”, the nearest values of the w’s to Q, or 

here Om € 5 ^ "T js optimum- This means that optimum 

=i hereto. 


etw 
0 non-zero P’s are P» and Pau Ww "s a h 
procedure wit - close approximations t 
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ising i i ing, in 
The problem, and its solution, is similar to problems arising in the theory of linear programming. 
: " 
particular those described by Beale (1955). 


2. THEOREM 1. If Py, Pa, P4, ... satisfy the conditions 
(i) P,20, 


ü SP,=1, 
n=1 


E 
(iii) X o,P,=2 (0, >0, 02 0) 
n=1 


and b,, ba, by, ... are finite real numbers then there is a set 


= sero values. 
(P4) containing not more than two non-zero values, 
for which either 


E 
(a) B= >) b, P, takes its minimum possible value, or 


(b) B gee a value exceeding its lower bound by an arbitrarily small amount, ro 

Proof. We first prove that if B attains its minimum value for a set with a finite number of non-ze 
P's this number need not be greater than two. 

Suppose that a finite number N 


(23) of non-zero P’s, Puy Pa, 
of the theorem. Then values of the 


+» Pz, say, satisfy conditions (i)- iii) 
ratios à, : d bes. :6,,, can be found satisfying the conditions 


2 = DL zn u) 
t = 


If we define 6; = 0 for all j not equal to any c, then the set {Pa +ô} will satisfy conditions (i)-(ii) 
provided min (Pa,+,,)>0. By a suitable choice of signs of the d’s we can ensure that 


N N 
È bal Pai + ôn) < 2 ba Pay 
i= i=1 


Hence, increasing the numerical values of the 6’s until min (Pa, + 6, 
satisfying (i)-(iii) with N’ 


; p's 
j) = 0, we obtain a fresh set of P 
(<N) non-zero values Pp, 


Pho Pg and 


r ] 
This argument ean be repeated as long as N 2 3. Since we Start with a finite number of non-zero P'S 
the case N — 3 will eventually be reached, and a further application of the argument will reduce the 
number of non-zero P's to two. 
Hence if B has a minimum value attained with a finite number of non-zero P’s this number need not 
be greater than two. 


Now consider sets of values (P) containing an infinity of non -zero P’; 
the value of B for each set; must be the limit of the val 
's of non-zero P's, 


isfy 
8. Since the P’s must t 
ues of this sum for a seque 


r nd 

ll sets {Pa} with not more than two non-zero P's there 1$ 

nine of B for all sets of (P,} satisfying (1)-(ii). 

n and w, are each increasing functions of n m . metric 
as and (ii) th se with para 

ex from above, i.e. if n<n' <n", then (ii) the curve P 


(By — b.) Oe > (bpr 


~ ba) On + (By bu), p 
n} Satisfying conditions (i)—(iii) 


3. THEOREM 2. If (i) b 
equation (bn, On) is conv 


then there is a set (P. 


t 
pini: A of Theorem | i Pao. is is the 8€ 
containing either which minimizes B, and this 


(a) only the two non-zero values Pas P 
(b) the single non-zero value P. 
roof. This theorem is « intuiti i " 3 wW 
B A e E. Kos een z "ia by intuitive geometrical Considerations, but the proof given belo 
(i) Assume that there is no w. 
Consider the set (P 
the equations 


m41 Where m is defined by 


: a Y 0, « Qc or 
m = lif there is an On = Q y Ue 
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Hence 

= 05— 9 Bos Q-—o, 

(05— Oa (9g — U, 

and diu 

b Pa tbgP, = 1 
-— Pa + bg ET [bz(w@g— 2) - b ( 0. 9). 
"idently w : 
y we must have 9, «Q «0g, 


le. 
a<m<mt+1<f. 


If x 
m +1 we can replace f by f with 
popemt+ k 


Th 
e new 
new values P^, P, " a A 
a must satisfy equations analogous to (3). Hence 


/ 
L 
[ba (o5 — 2) bg (Q—04)] 


b, P, + bp P = 


and p oe 
imei EE [(bg— bg) [ES (bg — ba) wg—(bg- bg) wp] 


b, P , 
«P, + bp Pie — b, Pa—bpP a= u 
(og — Oa) (pr ad) 


<0, 
ion (2), since pp 
py p' <P. Similarly. ifa<n 
of B is a minimum for the set (P. 


2 B is decreased by replacing & by 


by 
Y reas r 

son of the convexity condit 
m} with the two non-zero values 


Hen 
ce is 
9^ where 3 decreased by replacing 
Pw P, do <m. Hence the value 
TE diss à 
1. and the value of this minimum 1S 
1 


[brn Om+1 
From th Om+1— Om 
of Theore " ie to Theorem 1 this is also the minimum value of B for all sets (P,,) satisfying (i)-(iii) 


(ii) I 

Sets (pus =Qa similar argum! 

Princ P atisfying (i)-(iii) of Theorem ] with 
m41 Not equal to zero. The value of B for this set is 

— Buda I Hal 


by the convexity property (2). 


- Q) bsa( 7 09) 


imum value of B, among all 


on that the min 
is attained by the set with 


e conclusi 
o values, 


ent leads to thi 
just two non-Zer 


(4) 


But the v, DELE 
alue of B for the set {Pn} with Pm = Lis by» which is less than (4) 
i he value of B. 


ence j 
iw. = rie 
Um = Q, the set Pm = 1 minimizes t 


2, it follows that i 2, Wn) is convex from above (as defined 


b ie 
Y (2 
e di m the optimum sequential procedure for constructin| nfidence intervals with assigned 
ce coefficient Q is a fixed sample procedure or a close approximation thereto. This covers & 
heory including, for example, all cases where wn is 


, 


f the curve (7 


so Anscombe, 1949). 


Ey, 

9D j & there will be effecti 

Opti if the eur : al ill be effectively 

mu urve (n, Wn) is not convex : ‘ 

ne ™ procedures d two sample sizes, althou oW differ by moro than un: 

cong Sults obtained in hen di are valid for any decision rocedure (i.e not only for the construction 

a m 18 

Parameter” intervals) provided the On" only on ® cene ipea apply 

Sf. ruetio or the procedure by which ? is chosen. For * se ei d they apply d 

to; Son en of sequential tests comparing two hypot eter 0 being estimate 
Titio ce intervals if Wp were allowed to depend 


1 D n eM dt (see al 
A(27) J 0 


from above; 


the construction 
d (subject to the 


n X 

Xo, P, = Q independent of 0). 
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Estimation of means of normal populations from observed minima 


By H. A. DAVID 
University of Melbourne 


1. INTRODUCTION 
Tn life tests it is often advanta, 
say k, has been observed amo 
impossible or impracticable ti 
sured follows an exponential distri 
detail (see, for example, Epstein & Sobel, 
sideration in this paper, Gupta (1952 
of the mean / and the standard devi 


r some preassigned number of apre 
e are other situations where it is wm 
readings. When the West aba i in 
fairly simple and have been studie 


: ths, 
" : - ere the problem is to estimate the breaking streng 
Z4 8nd ug, at A and B (expressed as the logarj 

when » similar wi 


* Ci o be 
ll consider th pendent N(,, e?) variates. ASSEN ee oie 
all consider the estimation of murals 
and the method of moments, . s ia "rue MCA NAMUR 
The likelihood function L(u, o) is given by 


Aie t | = »[ 


ii s) g ^P 


1 ^ ji f poo 1 1 ae p 
——(z,— a) ye s 
mA JU sm goi n] v) 


up (v; — u)jg, z(u) = l 


Let 


oo 


1 
Q(u,) = | (dt and 4, = zlu) lu). ; 
Then log L may be Written M 


lgL-Qg.4 b. S 
g n 080-3 E be (m 1) X log Q(u,), 
i= i-1 


where C is a constant. The maximum.-likelihood equat; 


ions are 
7 log Lit => 


2 
"Web Qn dy co " 
and em Nee. De 0 «d 
These equations may be solved numerical] ie T" 
1931, Table IT also Editoria] 1955, Biome, = aan "S d ieee eee i 
However, for n large it is much easier to estimate jj and 7 by the method of moments. We have 
Elx) = Lo 


TU Xe Dv, 
PME 
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Where z = Yx,/, 

x = Xz;,/n and c, = 1 — 1(,— 1)/ : is th E . 5 

while 8. ; i Kd. Bin 2 m+...3 Cn is the factor making o* an unbiased est 

Bo is the coefficient of kurtosis of the extreme. £,, and Vm as well as the coefficients ae: 
1 Ps OF t. 


large: 
eae m normal deviates have been tabulated by Ruben (1954) for m< 50. 
Ww compare the large sample variances of z* and o* with those of the maximum-likelihood 


estimators f and 6. 
Ji and ĉ. The latter are, of course, only asymptotically unbiased. From (2) and (3) it follows 


that 
a? og L = 
=EL 9B = 14-23 7 x4(4;-uj), 
n Op n 
c?,)oegL 2. m-l 
-— L = —Su;,+}— ZA 144,;u;-u? 
n oer n n ( à i) 
and o? log L 3 -1 
Eur ru 14234 BAA? + 2u; uj). 


n 6c 
n of mazimum-likelihood and moment estimators 


nima of m independent N(u, 0?) variates 


Tab i i 
lel. Large sample variances in terms of e? 


(Ê, © and p*, o*) derived from n m 
m | var ji varji* var var o* 
1 1:000 1-000 0-500 0-500 
2 0-781 0-782 0-509 0-515 
3 0-796 0-806 0-514 0-532 
4 0-854 0:874 0:517 0-541 
5 0-922 0-956 0-520 0-550 
522 0-558 
6 -991 1:041 0-522 
7 bt oh 1424 0-523 0-566 
8 1-120 1-204 0:525 0-572 
9 1-180 1:281 0-526 0:578 
10 1.236 1-354 0-527 0-583 
12 1-340 1:491 0:529 0:592 
15 1-477 1-075 0-531 0-003 
20 1-668 1:937 0-533 0:617 
: 2.350 0-537 0-637 
es SH 2.072 0-539 0-651 
o _ 0-541 0-661 
50 2-362 2-935 
L 
To 
Obta; 
"en expeeted values we require terms of the type 
Since th goed) aod 40649. 
© density function of u is "T 
f= mz(u) QR" > 
Weh i 
ave rz. 2m du 
6(u"A) =m uz. 
ai d mt 
= ET 
m: J odu m 
m (7 wz- ur) 9771 du 
= m-1J-« 
= rátur)- 40770) 
Sin; = ie 
Milar] 2 
Y» by two in i we find for mz 
tegrations by parts: ds 
i 1-2) - (r2) (ut) + 24 ). 


; 1 —1) E(w 
swan = Ga 
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usin ese cases var TIX n beo! inedi t four moments 
i i tained in terms of the firs: 
Thus in all th: the covariance mat: of 4 and g can be ob 3 i 
of the extreme. For m = 2 numerical in! tegration is necessary for the evaluation of E(w A?) 
The variances for large n of u*, o* can be obtained by standard methods. We find 


var y* = P Vint HEE Be Y-EnVv(VnBb 


g? 
varg* — mi 1). 


S UD 20, 30, 40, 50. 
In Table1 the large sample variances of fj, u* and G, o* are compared for m = 2 (1) 10,'12, 15, 20,3 


3. CASE oF TWo UNEQUAL MEANS em 
lation: 
i i f two normal popu! 
i e are concerned wi and j, o PRP ; 
In this section w f n pairs of observations a, and Y: 


A 'he 
+2,...,k) and y; (j = 1,2, n—k). T. 


log L = C~nloga— (Eug + Ev?) E log Q(uj) +E log Q(vj), 
where Ui = (v—p)le, v= (;— ny)e, 
U = (ti—pe, vf = (yj ue, 


and Q(u;) is defined as in (1). With 4, = z(uj)/Q(u/) and B,— z(o, 


" be 
)/Q(v;) the likelihood equations may 
4) 
written UNES 7 xp ( 
My = E+ p» 
4 ê m 
=7+—— XA, 
Mye ok 1; 
1 PN " PST. a (6) 
and n= x DX 4k rg Xun-gyr-(n— 1) T- fs xa eru XByw;. 
[ 
Forn not too large these equations may again be solved numer ically 
In this case the moment approach is not s 
estimators of E(x 


T gare 
o simple as in the preceding section. We note that z and 7 
[2 «y) and 6(y|y<2), and 


he 
ectations in terms of t 
We have 


Perles =|" ftsgydypPr(s e y) 


= fla) | INWY), 
ed that if y, = 


where £ = (H2— Ky) (20), lt may be observ. Hy then 
Hela<w = 3ft) f” sandy, 
T 


nsity function of the smalle: 
Now 


QE) &( |x<y) -f uf (x) fw dy dex 


= E 
-[ fe" af (x) de dy 


-f l e| Leii tie 
-ed(2m)o P gl o yy Van (93 Ha) et? dedy (2 = (x— p.20) 

zl aes. -1(v-njs c WORT — js | ay 
J zene” i aeea- 


T 1 
= #2 Q(E) — PN exp [ Bec uem] " 


r of two identically distributed variates. 
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Writi 
riting A, for z(Z)/Q(£) we have 
é(r|z«y) = ut— 


and b . 
a Sy ly <a) = t7 Ae (8) 
Similarly ed 
tly, we find var(r|z«y) = U T3EAL- 34D (9) 
and 
var(y| y «2) = 71 —MA.:-34*g. (10) 


Equati 

atio; j " ó 

preferable (9) and (10) could be combined to provide a third estimating equation. However, it seems 
e and simpler to take instead k/n as an estimator of Pr (<y), ie. 


Q(£*) = kin. 


* * 
s Hb y from 


(11) 


whieh gi 
1Vi * 1 H 
gives £* immediately and hence o* 


c 
ITE (T) 
N 
= LZ 
There į " . 
ere is one point worth noting here. From (7^) and (8^) we have an equation for o*, viz. 
(12) 


y2 8 -yie* = 2£* — (Ag — Ap) 
r*, Therefore, if 5* is positive (i.e. K>) 


ame sign 8$ 5 dee à 
tory ation likewise if £* <0 but >. This situation 


ut Tis] 

“18 less than 7, t > i 5 
n J, then (12) will have e amt velt different, and suggests that the simpler model 

nd jr, & iction we may; especially in large samples, 


My i : A 
v 13 adequate. Even without finding suc s cantly from A 


decide; 
e 
ini favour of the simpler mod 


icati he same load until 

j d to re eated applications of tl 
abjecte ra eleven at B. A and B were so far apart 
ote the observed breaking- 


king at A 
d t. Leto and y; den: : 
red in logarithmic units from a suitable 


4, ANU 


Sixteen sim 
Teaking į 
n 
hat the at 
Stron 
Origin 


ilar aeroplane wings were in turn sv 
A or B, five brea 


Fe one of two positions, i d 
rength at A and B may be regarded a5 inaapi s 
B at A and B respectively- The following values (M 
Vere obtained: 
Ti: 
uo ce vuoti, T3066 03410; 0:2142, 0-2175, 0:3474, 0:3674, 00789, 0:1642. 
0:5197, Y4 ms 


i E 0-2519, — 0:0667, 02842, 0:0380; pinea 
"ons (4)-(6) give the maximum-likelihood estimates of jiz fy 
fi, 20421 My = 9.384 and g = 0:166 

i “responding moment estimates are from (11), (12): m p 

du n aes " in RE. gue imple method of estimation. 
by a Sample size n = 16 is in this case not really gufconty larg for “he estimation of c was of sub- 
Sidia Vents to be altogether satisfactory» especially a TP re the estimates of fz and {i obtained 

erimont Y E different specimens 


fe, 4 
"om ¢ Interest as the purpose of the expe 


€Sts on wi sts 
Wi A fatigue es * B 
acl. d omm , poratories: Victoria, for referring to 


Tani . T 
rA the n debtod to Mr D. G. Ford, of the AoronM S ications are iven in se Lei ry "d m 
955, Prob p tails on ble comm! o ers 

aa th Dr = pe in $$ je rd pearls made Ed ud E. Jacobs of the University of 
^u. J. Williams an rof. B. >- . Laby and Miss 7 


Mays 
‘elbow È Table 1 was compute 
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The power of the Poisson index of dispersion 


Bv J. H. DARWIN 
Department of Scientific and Industrial Research, New Zealand 


l. The distribution of the index of dispersion, Z — 


75; s.-s En, drawn from a 
X with n—1 degrees of 
of Z when the parent 
and eumulants K, Which are o(uar) for r2. natives 

This problem, which arises in considering the power of the yi_, test of Z against the alteri teman 
described above, has been tackled in part b d Kathirgamatamby (1953). Ba of 
discusses Neyman’s contagious distribution as an alternative. She finds the first four moments a 
and shows that for alarge sample size or alarge mean these moments tend in form to those of (1 + inen 
mean) for Neyman's distribution. Kathirgamatamby dis 
-1a/n, where y2 


finds 
5-1, 18 such that Pr(V31>Xk-1,¢) =a He 
ts distributions 


: isson are 
to these when the alternatives to tho Poisso 
bution, the negative bi i 


Š iables 
È («;—%)2/z, for a sample of independent varia 
i 
Poisson distribution with a high 


freedom, say Ni-1- Itis the obje 
distribution of the 2’s is not Poiss 


m fa 

mean, is known to be approximately en ja 
ct of this note to find the limiting distri Loup 
on but has a high mean 4, a variance of o 


the first four eu 
Thomas's distri 


2. Z may be written (D(x, —3)*/n) [a] pu). Suppose the distri 
Kı(=4), Ke)... Then the joint characteristic function of 2; 
Eexp [Zib (x; z)] = 

function of the (x; 


MXO;~B) Kyi? 


nts 
bution function of æ; has cumula! 

@ (j= 1,...,n) is 

Eexp[iEz(0,— 0j]. 


The joint cumulant generating —®)/./u is then 


KyiT (1) 
p "mdi Dres eS Y By... 
Hence if k, = (ui) for r = 3. 


of n normal var 


transformation of the (w,;— 
Tchebycheff” 


ion 
9(0,— 0,25) the cumulant generating mur 
^ — 1. It follows easily by the use of an ome 
tribution of Xa, —%)?/k, is that of y3_,. Ag& orem 
as Jt becomes large, Hence by a convergence the 

of Zu|k, for large y is that of xà. a 
yman’s contagious distribution of Type A, a neg 


i $ ^ " n 
binomial, and Thomas's ton when each of these hasa large mean and a ratio (variance) /(me* 


which is 0(1) in y. 

If an alternative to the Pois 
E exp [A(z — 1)], where À has a di; 
K; of dF(A). The condition that Kı and x, are larg 
becomes large for r = 3,4, sex Bi i 
«is is then suffici 


distribut 


son has a com 


& y, and 
found the joint distribution of s= Uy 


; j-1 M dard 
ndently from a Universe with zero mean, unit sti j 
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and A, neglected, where A, = «,/«}". 
oximation in the A's. 
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deviatio 
n " 
From this eee bc oem terms of higher order than those in A2 
F z E ribution it is easy , 2 
or simplicity write 3-1. = pile KIT Pr (Z> X3-1,2) to this order of appr 
u = D (x;— #2 - EX. 
ae observations x; have been drawn fi a f th 
iance x, = a wn from one of the alternative populati it c= 
2 = ap, then in terms of the standardized variables y and ee ae menio in 
1 2 
IIo owardrihai u = S,au- Xp — X8yap[n. 
e index of dispersion test based on Z, that is, the probability of establishing significance 


at the 1 
00c % level when a> 1, is therefore Pr (u> 0). 


Gayen’ 
s fi ee . e 
first approximation to the joint distribution of S; and 5, is 
ESL Sje9ei8 
(3) 


ISS) = Win- 1) = Tian 2io-»T(n — 
: (n—1) = jag) 2-9 Tn — D 
and S, to S, and w, where 
x * 
i B= Sy S 
* nyap) ap 
rm of the distribution of w 


Tf now 
we change our variables from S; 


and inte 
grate out for S,, we find for the leading te 
mU d 


Se) = gie-STQn- D 


1 degrees of freedom. Hence our first approximation 


E 
is aj T - 
t pproximately distributed as x° with n— 


o the power is 
Which is th Pr(Zz X) = Pr(uz 0) = Pr (wz X/a) = Pra? Xi sala); 
i Ebete © result reached in § 2 for the term of zero order in po. There is no term of order u~? in the 
n of w; that in 47! has contributions from the expansion of 
4 Un-3) 
1 Spree = lesa] agde 
in’powe nl(au) 
ay of u~? and from the appropriate terms in Gayen’s coefficient of W(n—1)- 
onvenience we write 
easy cal Fy,= Pr (322 Xala)» 
T(Z- PRENDA give for the power of the test based on Z to order 4/7, and for n> 5 
?XYh-na) = Pr(Z>X) 
" pA A(n-DX p.a - 27, am 
Maat Snap [o ES 2F,-s FT “in Man) Jam (Prat n-1t ^n-3 
Aun 1* n7 DP) qe, - itd 
+ a(n — 1) Qus BP P + 3 ip (Fras BF yas + BF Fa. (3) 
have been made from this formula for 
ive binomial distribution. 


8n 
NNI calculations h 
a = 0-05, som? d for the negat 


Taki 
in CT 
a . a E the significance level as i 
*s distributions an 


Sand 
For these ia vs of Thomas's and Neyman n f a only: 
ee distributi d Àg are functions O S A 
ate give Values for the tions Ayal ee patote were as shown in Table 1, while the values for tho power 
n in Table 2, together with the approximations derived by Kathirgamatamby. 
Table 1 
agent Case = 2:0 
ase = i 
Alternative i x 
| Aut Asi alt 
- As vA | a Wa HN 
€ | | 
AE MEL ord | 
3 3-4496 
Th T 171618 
domes 1-47015 is 176777 3-7500 
M Pons 1-49691 2.6667 2.12132 6-5000 
4) ive binomial 1-63299 j mie. ee 
> CRM: | 


TEN 
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The differences, with respect to v, of the functions F, were found by the use of the formulae 


F(z) 
Erle) - Fay) TET, 
o[95-s) êF x) (4) 
F(x) - 2E, (x) + E, (x) = [n e Y 


= E) enm (5-5+ 1) ‘ 


Approximations to the power of the index of dispersion test 
(The figures in parentheses are Kathirgamatamby’s values.) 


Table 2. 


aed a= 20 
| | | 
| | Main | | Main 
#=1 5 | 10 term =l 5 w | em 
n - 101 | | | 
Thomas 0-874 | 0-889 | 0-890 | | 0:999 | 0-999 | 0-999 | | 
(0:870) | (0-888) | (0-890) | | (0-999) | (1-000) | (0-999) | 
Neyman 0-809 | 0-888 | 0-890 0-899 | 0:999 | 0-999 -— 0-999 
(0-865) | (0-887) | (0-890) (0-999) | (0-999) | 
Neg. binomial | 0-834 | 0.38] 0-887 0-996 | 0-998 | 0-999 | 
(0-834) | (0-880) (0:886) (0-988) | (0-997) | (0-999) 
ies -— lis a | —— 63 
n-51 
Thomas 0-636 0-066 | 0-670 0-962 | 0-962 | 0-962 
. (0-635) | (0-667) | (0-669) (0-956) | (0-960) 
eyman 0-629 | 0-665 | 0-669 E |. 9.962 
(0665) | (0-666) | (0-669) | 9674 | — n 
Neg. binomial | 0-588 0-657 | 0-665 0-910 | 0-952 | 0-957 
(0-606) | (0-659) (0-666) | (0-898) | (0-949) | (0-955) 
é = 
n= 20 | | | 
Teia 0337 | 0379 | ossa | 0-094 | 0713 | 0-716 
eyman 0333 | 0-378 | 0-383 || oag : 7 : dee 
vip si | 9 0-682 0-711 0-714 
E Pinomial | 0-310 | 0373 | ösa 0-559 | 0-686 | 0-702 
a Is j i Mu si 
n - 10 | | d —— 
Thomas 0-20 e | | 
ii 1 | 0-246 | 0-251 | 0:443 | 0-479 | 0-484 
| 0199 | 0-245 | 0951 || asy 0-431 | 0-477 | 0-483 0-489 
Neg. binomial | 0-18] | 0-242 | 0-249 
| | | 0319 | 0-455 | 0-472 
|-——4-— — ud 
n=6 | | | | | E 
Thomas | 0138 | 0383 | | 
BON 
d á 0-307 — 0-345 | 0-349 
emen | 0135 | 0182 | ones | 0194 | 0.296 | o342 | ogag | 0354 
Neg. binomial | 0-118 9.179 0-186 0-199 | 0-323 | 0-339 | | 


Gi pa — n 
——— a 

au-— 

di 
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For all 
A n except 101 i mm 
introductie the differences were check 
dap ea 3 checked by the method of i i 

from the Wilson Shia & Hartley's tables (1954, p. 13). Fo! te Deemer pee 

The table toti ilferty normal approximation. istis a p SG 

ows in its comparison of (3) with Kathirgamatamby's results thi h 
: X at the main term 
F, = Pr (Xì > X&-uala) 
,/k, and not on the higher cumulants, is à good approximation to th 
X he power 


when n = 51 and 101. Tt is less than 2-5 % higher than Kathirgamata by’ 
is required the second term gives an d. Ren 


whic} 
for m, depends only ona = 
figure mindy irae purpos 
en fi 
for p as low as 5. If more accuracy 


Correctj 
ction d 
OW] =5 H 
n to jJ; = 5 and in most cases down to Jt = I. 
the accuracy of the correction when a = 6, 10 
, 


e have noi 
an 20 and ye are calculation by which to check 
T= 51 ar s the same values as before, although it is to be ex it wi 
Ad Ui, gl expected that it will not be as good as 


My thank: 3 
anks are due is: LH. > for z n part of the calc w] 
e to Miss B. I. Harley for checking the main part f the calculations, hich were in 


om, 
e places i 
: es in need of correction. 
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ate normal distribution 


the bivari 
tingency table 


Some properties of 
e form of a con 


considered in th 
H. O. LANCASTER 


By 
nd Tropical Medici 


ne, Sydney 


School of Public Health a 
uency distributions Karl 


ivariate freq 
a two-way table the con- 


y tables and b 
classified in 


ontingenc. 
bution 18 


1 
Pe, | In consi 
nsideri 
3 sidering the properties of c 


so 
tins, 2 (190 
Beney meg showed that if a bivariate normal distri ; 
he correlation paramoter are related by the expression 
tis ass, gen = P0 ep a) 
Ssur 
S eq ^ ed that N, the number of observations, is large and that the class intervals are very narrow 
Tiag, ation, in fact, represents à limiting property: or a p-way classification based on a multi- 
Pearson also showed that 
(2) 


eq 
rris] ai 
al i à 
distribution of dimension P 


geal WRR- E 
is the determ 


x Rand R'is ete 
proposed his coefficient o 


inant of the matrix (21— R). 


Wh 
Cre p 
Tis the 
8 on rales ssa of the correlatio f contingency 
basis of these results that Pearson s 


2g 
"n 
149 
tion par 
1nchange 


n matri 


(3) 


ameter pP- More generally, it may be 

i Let v d by any arbitrary, not necessarily 

riates. 
int O 

from the pom 

y tables sad" ub 


] to th 
istribution, 
oth va 


he J; 
Spas, Mitin 
lin, ded as ác normal ease this is equa 
Chg ee property ofad 
z In 194 ation of the scale of either OF b 

" 0 Fi 
b Ong? ? that Fisher considered contingene. 
MS In Scores’. i ; ; a are assign! ü x 
ij dig&eney fats es’, i.e. arbitrary variate values. the rows so that à linear 
ma crenti le: what are the best scores to assign agens This turns o 
*ximj ‘tiata the classes d ined by the columns, and vice vers „relations are t 

sses determinec M “ial the required correla 
Biom. 44 


Izin; 
t 5 " 
E the correlation between the scores 


f view of discriminant analysis. 
rows and also to the columns of 
function of them will 
ut to be a problem 
hose known as 
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1941). 
i c inued and developed by Maung ( 
‘canonical’ in th of Hotelling (1936). The work was continue Y aer 
in pes net ee a result by Fisher which gives the observed frequency in term 

n > 


ifthe 
s à la „and ift 
canonical correlations; in fact, if the frequency is ai; with marginal totals a; , a ; and total a., 
canonical correlations are R,, Ry, +++) Ry, We have 


m- 


aa; 1 (4) 
a; = m 2 (uae) , 


where z and y are the assigned scores corresponding to the given cell. 


; , itherto dis- 
3. In this note I derive some further theorems in this field and link together some hit 
connected results. It will be shown that (3) 


| identity 
is, in the limit, equivalent to the well-known Mehleri 
or tetrachoric series: 


c il, (5) 
(97) 0 -ph-exp(— 3G 202y-- y*J(1— p*)) = (21) exp (— 4:2: y*) fı HE pe pnp ) d 
where y(x) is the Hermite~Tchebycheff polynomial of order i. 


A MAXIMAL PROPERTY OF THE BIVARIATE NORMAL DISTRIBUTION 

4. We may consider the variables standardized so 
THEOREM. Let x and y 
now a transformation, 


i 
as to have unit variance and take 0 <p < 
be jointly distributed in the biva 


" If 
à B T on Pp» 
riate normal distribution with correlati 
w= x(x), y = y'(y), 


is made to any new variables x’ and y’ such that 
A o 
(27)- | vÜexp—da*dz and (2m) | 


-o 


y” exp— }y?dy 

=g ¡mum 
P ) J ] : " navim 
are finite, then the correlation of the new variables is less in absolute value than p. That is, p is the m 
canonical correlation. 


ized 
on " ardize 
Under the conditions of the theorem, the new variables may be expressed in a series of stand 
Hermite~Tchebycheff polynomials, 


(6) 
v= dy ta ya) + as roa) +... 


2 2 
such that ema f (47 Sean) exp (— 4) ax} 
— I 


ge 
2 :near chan 
convergent, Moreover, the correlation is unaffected by either a linea! 


» 80 that without loss of generality we can write, 


o a 
v= Jad) Sat =, " 
1 
o o 
y= Xo), Eua, 
X 
where the Vo) obey the orthogonality relations 


i (8) 
3 Vie) V. Gc) exp —Ja%dn = 95. 
=o 
By a consideration of the expectation of exp (te — Ha? + uy — Jur), we find 
E (9) 
J^. 
MNT dedy = syp, 
ele i» — function of the bivariate normal population web? 
e of g’ 
ae * and y' are easily found to be unity by the orthogonal properties of the V; (30) 
E 
corr (2^, y) = Na p pt 
1 
and this is less than |p| unless a, = p, — 1 
a very general class 5 i 


lel lation und 
P | is therefore t i ical correla od* 
of transformations, This was Proved by to paximum canoni 


. eth! 
Maung (1941) by an alternative ™ 
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* lHEOREM € 0 be assi ri 7T Tela. 
2 . he values to be į T 
gned to the canonical variables cor i 
o. T Th bi "responding to the canonical cor: 
la- 


tions in d i ri 
escend. 
We have irs ci are Ee and y,(y). The canonical coi relations are the powers of p. 
ave already shown that the maximum canonical corr i wi A = 
h h elation corresponds with yr. 
(x) = x and 


Vi(y) = y 
u(y) = y. Let us take a second set of values x” and y”, such that 
Di ae” = Bly”) = “y= = = = 
E(x”) = E(y”)= 0 and E(xx") = E(yy^) = 0 and E(x")? =1= E(y’)* 


We - $ 
may write "= Sener) 
1 
E (11) 
y = Ease 


The conditi 
nd 
itions of the theorem again enable us to set $ c? and $ d? equal to unity 
i d=1 i 


Further E(az^ =1 
er E(xæ”) = 0 forces c; to be zero and similarly d, is zero. 
Now Ls 
corr (z^, y") = Xicidip^, (12) 
and this is ; : . 
This dm maximal in absolute value only if ¢ = d; = 1 and all other c; = d; = 0. 
à cess can be extended by induction and proves the theorem. 
OROLL, " n z 
ARY 1. Ifa choice of variables can be made so that a joint bivariate normal distribution results 
ers of the greatest of them and the sets 


tions are pow 
chebycheff polynomials. 

the treatment of this problem 
and difficulties 


Tom ac . 
[o 
of Men DL table, then the canonical correla: 
al variables are the standardized Hermite-T: 


Conor; 
Would ba cee 2. The roots of the determinantal equation usually solved in 
Caused p powers of the greatest of them but for disturbances due to sampling errors 
y grouping. 
validity. The lowest of 


of minimum correlation has no 
alue will be due again 


Coro , 
LLARY 3. In the normal case the concept 
tely and deviations from this vi 


e non.; 
to Sampling e correlations will be p"-* approxima 
ing and difficulties caused by grouping. 
ENTITY 


IDENTITY AND MEHLER'S ID: 
gency table with elements a;;/d. , 


ng case when the contin 
We have 


|anrtoxn(— Met neto 


FisHER'S 


6. c 2 
Sones tree the identity (5) in the limiti 
e frequency function f(z, y) dxdy. 


F(a, y) dedy = fı o) yilu) 


Which į 
ch is the Mehler series. 


(5 bis) 


/ AND CORRELATION 


gency table Lancaster (1953) used the Hermite— 
vision used by Pearson. In the notation of 


7. Y, ‘Tu RELATIONSHIP BETWEEN ) 
mls In ] 
z ¥ OR e Gcr ing the partition of x? in a contin y 
his bapa, E to avoid the infinitely fine subdi 
e derived 
xla.. = R}+Ri+ tR 
e pp e T a pa-pa -p) (13) 
(14) 


Evi 
di 
s ently ytja.. 2 Fi 
Pontis or greater than any other observed correlation. A more general result is that in & complex 
cy table xla.. > Xn. (15) 
kd i 


. For if we partition x? making use of the 

be of the form yw; and 
being orthogonal to the first. The 
iti In the observed case, 
g maximized, may be 
that we cannot assert 


ore z, 
un e woi the observed correlations 
ron secon d Po of matrices (Lancaster, : 
th, Dining ed 2; /w;, where x is suitably normalize 
dig t ofc Ows may bo filled in with due regard to © a and y is bein 
tha, ent fr anonical variables for æ, when the correlatio zis being maximized. So 
hats | Pisdebwhen the correlation des T he canonical correlations. 


q 
.. is greater than the sum of the squares of t ns 
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We can obtain a result analogous to (2): 


log (1+ 2) 


iog(RR') (Pearson notation) (16) 
= —dlog| 1 -P | - Hog | 1— P |, 


ides 
2 <pand both sic 
here P is the matrix with elements, p;; = rj; for i+j, and Di = 0. If ġ?< 1, we may exp: 
eru the expansion of Durbin & Watson (1950) on the right. 


Wahaya 9*— 10 + 16%... — ptr P? ptr Pt... an 
and so @>ttrP?= À f$. 
Even for $*- 1 this still holds, for 


1+¢? = exp(3tr P? 1r Pi vss 
214 dtrPs, (18) 
9r. 
izj : ars 
3 t it appe 

"There are various determinantal conditions on the r;; derived by Pearson rg n definite. 

that they can all be summarized by saying that (1+P) and (1— P) of (16) are both y 

Pearson (1904) noted 


n 
s d the seco! 
that the first is necessary for the system to bea correlation system an 

condition is necessary for ø? to have a finite limit. 


Discussion 


correlations the corresponding 
cedure of considering the roots o. 
Specifies a bivariate 


(19) 
Pii = PiP (1 + prsy), ' 
and p52:0, so that 29, — pi 
for any pair of [A 

the maximizatio 
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REVIEWS 


Einfü — 
E voy in die mathematische Statistik. By L. Scumerrerer. Vienna: Springer 
' erlag. 1956. Pp.xxii-t405. £4. 3s. 6d. — i Po 
n this book it is evident th i i 

e s nd at mathematical rigour has had first call (indeed, t isher: 
DU e it is primarily for mathematicians) and a level of e AS eects aera d 
Stieltjes coer is assumed in the reader, augmented by knowledge of the elements of the 
all this, the pre 3 ementary set theory, Borel measure of sets and the Lebesgue integral. But, for 
take a few m SEHR has been kept to a minimum, and the general reader who is prepared to 
with the E ems on trust will find this an eminently readable book. Thus the first chapter starts 
Bi vclidvastantats po axiomatics, developing the set theory as needed, and follows with the theory 
Advantages a i unctions omitting the proofs of the uniqueness and limiting theorems. The dis- 
chapter ae ; Rios cma approach have not been entirely avoided however, and the second 
Y-provins O ng d istribution theory (together with the associated tests at an intuitive level) begins 
£ Cochran's theorem, deducing, for instance, the distribution of the normal sample variance 


therofy, 
om. o thir ; ; p x 
The third chapter gives a straightforward account of confidence intervals and the fourth 
The fifth chapter giving the theory of 


Bive: ; 

test Fide aha but concise treatment of estimation theory. 

Wilks lius S im recently been recommended by Neyman in J. R. ‘Statist. Soc. (It is a pity here that 

inear hypothesis "a given credit for his theorem on likelihood ratio tests and Kolodziejezyk’s general 
s sis has not been given explicitly.) The sixth chapter develops multiple regression theory 


and th 
e el j mersa ` S y : E cane dus 
ements of multivariate analysis; Wishart's, Hotelling's, Fisher's diseriminant and Mahala- 
ethods. The seventh chapter on ‘non-parametric 


DObis's 2 a:4.: 
tests? Y arsi pai are derived by elementary m 
uction) ees satisfactory, being rather serappy and eclectie (but it only claims to be an intro- 
lesen. rdered variables, the probability integral transform, the runs-, Kolmogorov-Smirnov-, 
Credit tha find sign-tests are briefly dealt. with (Continental contributions being given here more 
riefly of ü is perhaps their due) and thero is little mention of power. The short eighth chapter treats 
Preceded } he classical Bayes theory and its consequences 1n regard to modern theory. The whole is 
index pots full German-English glossary and followed by an index of moderate length. [Note: the 
Critien ce to x? as on p. 379 must be a misprint for p- 279.) — 
o reasons 8 of the book are chiefly on the score of omissions and relative neglect, partly no doubt due 
“ontinge; 5 of space and the nature of the treatment. I feel that a fuller treatment of x? is deserved; 
9 not eed tables, index of dispersion, Eisenhart's theorem on the power and optimum grouping 
© at le Ppear to be mentioned. More complex analysis of variance models and their powers should 
and dis ast described. Combinatorial methods get scant treat ment (Mood bi tenta are not mentioned) 
istri} Serete variables also. I cannot find that the negative binomial, Polya’s or Neyman s Contagious, 
?Utions are mentioned, while rank correlation gets only a footnote reference. The distribution of 


e co : : 
rrolati : > 2 be absent. Se! uential testin 
ation coefficient, Smirnov's e^ and m to g 6 


Could Neyman's Y? also see 
wi ; ) A 
ell be given a few pages rather than a bare mention. D. E. BARTON 


St 

atist; ; xn. 
tistical Methods and Scientific Inference- By Sir RONALD A. Fisuer. Edinburgh: 
16s. 

d authoritative account of Sir Ronald Fisher's 
however, advised not to try to read 
technical and sometimes controversial ; 
tributions, without always adequate 
y to repay study most by those 


'his 
i ibas 
Vie: Ok is welcome in providing an up-to-date and ; 
: i to statistics are 
i -o fairly 
anq 4). 200k entirely on its own. The issues are fair ly 
author's original con vam 
view. The book is in quence like 
ri istical methods. da 
the he first m rent statis d of the second chapter we first siens Io 
wi S8enti vo chapters are mainly historical, he end E pecans our e danh rm 
M Colo ial eant by pr : dum roo pe 
Sef r our : 
As on h 
ea ice is meant by statistical inference; for | e 
Stay; ^ Preted differently py diffe le. Thus in 
cs, by L. J. Savage, it is} 


more detal ) JOBS 
A er's experience is that the word statistics 
The Foundations 


t American book, 
matical statistics is to be 
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` y rical and 
i i * i eatment of problems of inductive inferenco . The men 
one pe ED wüh those types of Cdi gon pump y S m 
pue A he individual are considered, seems t. in 
the properties of the group rather than of = ; or statistical interpretation of probability, i 
is this aspect which has given rise to the frequency : veio omi ce pala Conte the 
i i bability as a degree of belief or credibility. j : xti 
contrast ee ee ie, inn probability defined by some aggregate or Lam s 
Pp um i any Pariular event, we must be unable to associate any — a ^ sosiiib 
diferent probability, to this same event. This is a relevant inductive a TH 
logieally distinct from the concept of statistical probability itself, and —— bent vival schools 
assigning to it any very precise meaning. It is well known that there are at leas im. and sigue mx: 
of thought on statistical inference: either one can play down the statistical aspec deren orne 
plicitly in terms of inverse probability, prior and posterior probabilities, and perhaps an aid to, but 
can attempt, as many statisticians do, to summarize statistical features of the data as e gathered 
not necessarily as a substitute for, one’s final inductive inferences or decisions. The impress 5 


5 i à ;cen these 
from this book is that Fisher is putting forward methods that logically lie somewhere betwe! 
two schools, but in the reviewer's o 


A istencies 
pinion these methods create difficulties and apparent po ee 
tages. Thus in Chapter 3 (bottom of p. 42) Fisher uses suc : AE te i 
o a single hypothesis by a unique body of observations’, bu aa dabis 
ion the concept of a random sa ple is applied to such onia Saved 

; in the Neyman-Pearson sense, of tho significance test IR has the 
vations is not sufficient; for example, any hand at bridge ‘ 


inter- 
isher test to have the froquoney iitude 
sticians arises in the reviewer's opinion from an inconsisten: 


p ae SUMA : "aem m an 
51/53, which is at one stage of the derivation considered rando 


ial 
- fiduci@! 
suggestion (p. 56) that no new axioms are needed for fid" 
inference would thus not be generally accepted. ` 


ill be 
of Fisher's theory of fiducial inference given in this book is as complete as W 
but the ‘theory’ when more than one 1j 


; e 
known parameter is present still seems 
atter of definition for each individual prol 
one parameter, 


o O0 
blem than a general method. In the cas 
the theory was accepted by mt 
which renders. i 


"Statisticians because of i 
t practically equivalent to the eory of (efficient) eonfide 
* Cf. E. S. Pearson’s remarks in 


statist. S0% 
‘Statistical concepts in their relation to reality’, J. R. Statist 
B, 17 (1955), 204-7, AS" A 


jon. 
ts frequency interpretati , 


à from 
nce intervals obtained 


T It is not the Primary intention of tho reviewer 
his earlier criticis 


A use ? 
- The criticisms by Fisher of the. 


Sdn heory 
© use of randomization in the th* gle 
n 


: into 
nt of the significance level, Laer iil 
m the observed variance ratio 51/5), is now available from Welch’s test (Biometrika, 3 
" r- 
(ii) (pp. 120 and 162) The reviewer’s original reference to the particular function w+o was © sor 
tunate, for he had overlooked the confidence (as well as fiducial) interval solution for the partic ye 
linear function p+ go this solution follows from the sam 
being independent of ¢ 


: caon- central 
pling distribution of the ‘non A solution 


: neo 
rs does not follow automatically from a mene i 
sampling distribution involving them Even in Fisher's fiducial theory, the problem of unique? 
hardly settled by the discussion of a, particular ex; 
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contin i 
relation to lm kn In the book confidence intervals are almost entirely referred to in 
walüsPsonfder ov SEE but as Fisher has rejected the frequency basis for fiducial intervals the 
Sarees ce interv als in all cases must be considered. Such confidence intervals are admittedly 
tunately, if no — e some applications than others; however, such routine statistical methods for- 
unique n = too automatically, often have an inductive value even when the data are more 
eN ai e as of small sample theory, over which there is likely to be most argument, this 
tha cid 1e ee partly because statistical experience (external to the sample) on the nature of 
Babee int ion o ten oxists. For fiducial inference Fisher has introduced the requirement (p. 56) that 
Diference oo «ie on tho value of the paramoter should be available. This is desirable if the fiducial 
ferais D e regarded as properly inductive, but the scope of fiducial inference seems now aca- 
y narrow, and the logical status of other prior knowledge or assumptions in the theory is left 


Mp ee obscure. in ! mi 
The pr i i i in 
* Tho problem of specification arises again in the section beginning on p. 66 (cf. also p. 127), where 
cit record of tho likelihood function, as it varies with an unknown para pe 
wn t for many 


as a 
summary of the data. The likelihood function is certainly relevant and importan! 
he basis of a particular probability model 


statistica] ; 

EN pursue nevertheless, any table proposed on t 

88 App To be an adequate substitute for the data themselves, as the model might not be accepted 

data, iei soe by other workers... Standard statistical methods become less available when the 

Weer wae enomena under consideration are highly individual, and standard inductions likewise. 

involve so the likelihood function 1s accepted as appropriate, the circumstance that it does not 

not de he number of ways by which the observed outcome might have occurred implies that it is 
pendent on the sampling procedure (for example, whether direct or inverse binomial sampling), 


a 
peice of which is sometimes required. 
the theo, Tom the concluding section on simu 
of this Se of estimation associated with the nai 
cory might well have justified its consi 


Itaneous fiducial distributions, the last chapter covers 
me of the author; the importance and established value 
om deration earlier, except that it is now fairly familiar to 
ee who, it has been here suggested, will most profit from this book. Such readers will be 
TH i but not otherwise diverted by Fisher's occasionally provocative style. The problem of 
olera ive scientific inference, and in particular statistical inference, is one 1n which impartiality and 
i nce aro as essential to further progress as rules of procedure. Any suspicion of dogmatism, 
whatever quarter, or however illustrious the person concerned, would be regrettable. 
M. S, BARTLETT 


í 
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Symposium on Monte Carlo Methods. Edited by H. A. Meyer. New York: John 
Wiley and Sons, Inc.; London: A. 3 Hall, Ltd. 1956. Pp. xvi--382. 60s. 


again after suffering a depression around 1952. It is 
sium held at the University of 


Tteregt ; 

reflected in Monto Carlo methods is increasing ape. 

Bora i ins papers presente! : 

A Wo 9n. 16 r ved d pe pe pol dese contributions headed by an introduction by 

M M : pes PT guide to the non-specialist; this traces the development of 

Pa Carlo since its christening in about 1947 and attempts to place the succeeding papers m 

Pecti EM c ng. vt 

, Throo ive, giving a brief description of most of them. f random variables. O. Taussky and J. Todd 

Ad & brief survey of th ilable me do random agen by Lime d 
ij. oS f the avail d ree 

i 368 and present the results of — — im T er pepe 

ransfo: 

j basic ; c 

eei bom pointed out in discussing thes 

en inconv 


3i : 
Probabili the important subject o 
and rej ity distributions. He discusse: 
MU ae methods. It should perhaps h J 
have m variable number of random numbers, 1S ofti 

nating 1 i igi latter that b 
x m remaining p ry from the straightforward to the highly ua ws eia € É = by 
F. apers va j i ere 
: on coy is outstanding Tr "acti troduced to a new 
vee? ciation *; i. ente principles of pool Monte ages 
i TO 

Peper wireducing technique. This is appli P 
bi o ont? its success is amply demonstrated. 

* explanation of the very heart of their idea. t 
Pong ahn gives a very useful collection of formul 3 o 
"ly used Monte Carlo tricks, for several of whi! 


i he now 
ults connected with many of ti 
ane is largely responsible. This is clearly the 
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^ iffusi y-rays through shields 
Bi etic ed fx IO iJ. Borger arnt ty bà. Bou ana RB hos, in which cen Do 
Mu reme See Um eaim deri i uO gun ona oile 
JT aman mat riis oo. D uiv a ed i cd or ape iy ep PR 
= ge Peel lene inel carum ire m 
on En Pa donec ee book is not new it is a valuable collection of ideas tha 


, d more closely 
had previously had only a limited circulation. It is a pity it could not have been produced $ uer 
i K. W. 
after the date of the symposium. N 


1 ; ‘k: John Wiley 
The Essentials of Educational Statistics. By F. G. CORNELL. New Moni Johi 
and Sons Inc. ; London: Chapman and Hall Ltd. 1956. Pp. 375. 46s. 


h 
kose r P ok, thoug 
The author writes in his preface: ‘as an introduction to statistics this book is not a kandbo 

it contains more than e i 


; ls the 
: n equa 
all possible random samples of size n drawn from a population eq 


mean of the population, imes tho variance 
(b) The variance of the sampling distribution of means of size n is equal to 1/n times 

of the population. 
(c) The 


tion is itself 
s CO F " atio 

sampling distribution of means of samples of size n from a normal popul: 

normal. 


ibution of means of size n .. 
ribution as n becomes incre: 
ell to have inserted “with r 


tions 
- for a wide variety of non-normal popula 
asingly large. r where 
neci cie t possibly *independent "p quite 
he diseussion in tho text.] This mothoc how 
'. svoperties are 
8 why thoy are used and how their pr Oper tions 
ical symbols are used for criteria, test 
se the student who likes words, but no mathematics, some anxiety. t known): 
re the sampling distribution of the mean (Population ¢ known and ae bles, tW? 
for goodness of fit and in contingency t9 spelation 
inear regression and partial and multiple corre 
les of the normal curve and of t a 


is 
f : -has done 

A a misnomer. The link with education is tenuous and what the author has 

to write a very elementary statistics text-book. 


b- 
L ‘ged su 
in statistics. It is pe o 
pling methods and is extended by c^ re recens 
the illustrative data is mo - 
x nd t 
: discussion on the use of statistical methods anc 
interpretation of results, including an extensi 
sample. Then follows a seri 


* esigh 
ve section on the way to ask questions aah an 
b or 5 Presenting tho data obtained by means of © 
graphs, frequency distrib; 3 


ro 
Ther 
wness: 
culated measures of average, dispersion and skewnes 
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are two cha fi 
pters on sampling -— 
summarizes pling errors and statistical infi i 
zs those avai à : inference, two on h 
and cyclical di a ailable in the United States, three on the analysis E numbers, one of which 
n ria yi s mE pa Ly E B F 

meie And eomuut gis and two on regression analysis and correlation. There ee tae mee Seca 

numbers, the T à a ional methods, and on the symbols commonly ü5 Eu re appendices on arith- 

ormal integral and of values of ¢ but not of y?. Each rds dare Ti random 

AS owi y a short 


bibli 
bliography and a set of questions. 


he expositi 
frente oll and throughout is very clear and withi 
and e n 
nd completely. In an elementary text-book there is always a problem of deciding how 


detailed to mak i 
makes tho sr d exposition and how much to leave to the student's intelligence. Too much detail 
masters ADEM a ‘ore v student look worse than it is and he may learn a lot of detail before he 
pain this book ee eet E ge detail he may fail to follow essential steps. In m 
onderous, es y yu > s on the side of giving too much detail. vi i / 
might [uii emp bs those students who are Lond several — - AS wp 
meets thom. Fo o give the student the idea and let him gradually sort out the snags TENE as s 
too litsen z he does the student really need to be told that if he makes the scale of a chart 
descriptions Ee the observations may occur outside the limits intended? On the other hand, detailed 
distribution ae he red which may arise in the calculation of an arithmetic mean from a frequency 
working fron ead of from an individual list and of the method of calculating the true mean when 
former ¢ = nis guessed value of the mean without using algebra are very good indeed, though in the 
been stressed. A aka of recorded values tending to bunch, say at multiples of ten, might have 
he mid-point = problem arising from assuming all values in a frequency class are concentrated at 
Way, and the : a the class in the calculation of the standard deviation is not discussed in a similar 

Caleulatio ? correction factor (— 7°) in the short cut method is merely stated, not explained. 
and prodata. in the text wero obviously undertaken with assistance from a calculating machine 
figures, Each and quotients aro commonly shown to seven, sometimes to ten or eleven significant 
ut thoro da dins the book there isa discussion on spurious aceuracy and on errors arising in rounding, 
to the student o indication later of the application of these excellent precepts. Nor is guidance given 
in the toxt of as to what he can do in practice to avoid having to use as many significant figures as 

© tables of hea book if he possesses, for example, not a calculating machine but only a slide rule or 
Curious resul ogarithms and square roots at tho end of the book. The method used produces some 

ults. For instance in the calculation of a correlation coefficient we have 
135,919,105 
=0-77, 


"= 176,439,387 _ 


n the field chosen most aspects of the subject are 


f least squares are given to seven 
ho fitted line represent the original 
e by two methods one 
y commends it. The 


Whilst indi 

a SN ddl values of a trend line, fitted by the method o 

Series to tl figures where in only four out of thirty-five values does t 

May won $ second figure. Looking at tho calculations for this trend line mad 

Student doubt if it is true that tho simplicity of the short cut method strongl 

a result, might well also wonder why the short cut method for calculating a standard deviation gives 

on 28-906 cents and the method working with deviations from the true mean gives a result, 
red calculated 


e faci bee: 
acing page, of only 28:9 cents, and this after having a sum of deviations squa 


Cley, ed 
t The be pe figures. 
ar in s ish student of economics using 
Ort cut itd figures are not a scarce factor 0 
of, Benera] 2 are added for academic interest. : 
a subject is is a good book for students who have the time and are prepared to learn 
slowly but methodically, hence getting à good grounding. 


tatisties will conclude 


ns 
t any real 


an ancillary course i 
to assume tha 


this book for 
hence go on 


f production and 
the elements 


H. $. BOOKER 


W. ALLEN WALLIS and Harry V. Roszmrs. Illinois: 


h 
© Free Press, 1956. Pp. xxxviiit 637. $6.00. 
to know about statistics who have no mathematical 
ined in words with a 


US is a 

ie ing. ed -book designed for those wishing ots ABER 

ao : The logical processes of the mathematics of statistical theory are explained Á 
"Vide, T 80 the reviewer thinks—in favour of those interested in economie applications. The book is 

anq g, Into four sections entitled, the Nature of Statistics, Statistical Description, Statistical Inference 

, the} 


Stati 
tisti 
Stics: a New Approach. By 


"Ds 


pivariate frequency distributions and measures of 
cal Inference We have probability and randomness, 
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i i f Special 
timation. The section of 5p n 
'umm: tes the normal curve, and some notes on es : rem ere 
as ary of d en 23 bag’ containing the design of experiments, quality setae eed 
ira ia i nd tin o series Formulae are given freely but no derivations or justifica 
regression and tim . Form " be 
| - - atician. 
i Am es cq o aes NN aimed at the book may prove useful to the non-mathem: 
At the lev 


" intended to 

with knowledge of mathematies for whom, according to the preface, the book is also ir m 
s nso F. N. DAVI 

be useful, are likely to go empty away. 


. , " ; r whom 
Theoretical Genetics. By R. B. GOLDSCHMIDT. University of California Press (fo 
Cambridge University Press act as agents). 1956. Pp. 563. 64s. €— 
i r 
Many readers of Biometrika are interested in genetics, and some of them have ad Consi of 
contributed to its mathematical theory. All these could profitably read the 7 wes t International 
Richard Goldschmidt, the grandmaster of contemporary genetics and president of the depuis genetics 
Congress of the subject. They might find—and perhaps be disturbed by their N Tl knowledge. 
is not necessarily a happy hunting ground for statisticians having but little biologi lation: genetics 
While the book does not cover the more mathematical branches of genetics, such as popu diseuaion of 
or the theories of mutation and selection, it provides an exhaustive and authoritative pat E, by 
the concept of the gene, as it developed during the author's lifetime and in fact toa a es 9 pages)» 
his own stimulating influence. The main topics are (1) the nature of the genetical <a det ond 
(2) the mode of action of the genetical material in controlling specific development ( i5 pages)- 
(3) the consequences of the nature and actions of the genetical material for evolution a8 end ment 
not of course cover the entire field of modern genetics and their et 
eteness even within their restricted framework; but the discussio > 
subjective and selective i 


d rather 
of the ordinary text-books. He may also at times be amused by the exposure of premature an 3 
childish generalizations b: itori 


comparable to embryolo 
excellent book. Fifty p 


z MUS 
dices greatly add to its value. x: RAD 


Rank Correlation Met 


. Charles 
hods (second edition). By M. G. Kzxparr, London: Ch 
Griffin and Co, Ltd. 1955. Pp. 196. 36s, 


nn nd 
Probability (second English edition. By A, N. Korwocomov. New York: Che 
Publishing Company. 1956, Pp. 84. $2.50, 
This is a second editio 


scheinlichkeitsrechnung by the world’s leadin, 
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Irrati d 
rationalzahlen (second edition). By O. PERRON. New York: Chelsea Publishing Co. 


1951. Pp. 199. Paper $1.50; cloth $3.25. 
. (The third edition, of 1947, differed 


This i i 
s a reprint of the second edition of Perron's well-known work 
printing are good, and the reviewer 


from : 
era second only in one page and a footnote.) The paper and 
St find no serious misprint. 
s 
arting from a set of twenty-one axiom 


irrati d 1 
a 3 numbers as Dedekind sections of the rationals, 
s on to deal with limits, powers and logarithms. Then without further use of analysis he dis- 


Cusses vari : ^ 2 1 ; 
Ten ius methods (particularly continued fractions) of approximating to irrational by rational 
Pas pas he degree of accuracy attainable is investigated. He concludes with a chapter on algebraic 

scendental numbers, proving that the numbers e and 7 are transcendental. This means that 


neither 5 
r of them is a root of an algebraic equation, of any degree, with integral coefficients. 
duction to the author's Die Lehre von den Ketten- 


he work 
Eu ut may be regarded largely as an intro 
^n. There is a good bibliography. o A 


ns satisfied by the rational numbers, the author defines 
and shows that they satisfy the same axioms. 


T . 
tigonometrical Series (second edition). By A. Zyamunp. New York: Chelsea 


T Publishing Co. 1952. Pp. 329. Paper $1.50; cloth $4.95. 
a twenty-one years since Zygmund’s "'rigonometrical Series appeared in the Monografje 
Printed yczne. The present edition is of the corrected reprint of 1952. To have produced this beautifully 

i bed on excollent paper at $1.50 is a triumph of publishing economics. 
It is is still no book on the subject which compares with Zygmund's for clarity or thoroughness. 
& mathematician's book, but apart from & knowledge of Lebesgue integration it presupposes no 


i 
C ip ; 
t EH knowledge of mathematics. Although the character of the book is to approach questions 
ore a Bengs and summability concretely, many readers will find in it an excellent introduction to 
generalized theories such as that o. 


f Linear Operations. There is a long chapter on Riemann's 


e s 1 j 
Ory of trigonometric series and a short one oi "s integral. Examples and miscellaneous 


eorems g 
upplement the text, and there is an exti : a à 
of 2 student of advanced mathematics and no m should be without this classic 
alysis; the publishers are to be congratulated on their enterprise in bringing the volume within | 
H. KESTELMAN 


of the Hungarian Academy of Sciences. 
through the trade organization ‘Kultura’, 


eM x 
e ic omatian] Institute of the Hungarian my i 
iqué that the above publication continues the series Publica 
e de l'Academie des Sciences de Hongrie s 


ki 
ofa 1 (1952), vol. 2 (1953), vol. 3 (1954)]. Tho chang? of the 
i Ed f the Institute on ] August 


© co 
Sciences. 


Pplieq 


name 0 


Tresponding change of the 
an Academy of 


Mathematics of the Hungari 


Methods in Numerical Analysis. By K. L. NELSEN” New York and London: The 


M s 

‘ WW" aemillan Company. 1956. Pp. 382. 48s. 6d. t ] analysis" says Dr Nielsen in his 
Prog, Cally every university i hing a course in numerical iss SAY Gls. Tt has nine 

D ry university is DOW teaching x, 2 ‘ose book fulfils. 
Chap > and : - textbook. This purpose ^" » T d 
Tas arg, dar is a need for s olommeite Differences: TII, Interpolation: ctl e 
Dig: ation; V , Fundamentals; II, v me Equations and Systems; VU, i : em 
3 V, Lagrangian Formulas; VI. C* d their Application; IX, Periodic and Exponenti 
ares an 


Ereni 
ce Equations; VIII, Least Squ: 
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R i leven 
ibli i les, including ¢ 
There are also answers to exercises, a bibliography, and nineteen tables, 
i > ere 8 " rud E : dc dii 
paper sation for interpolation, Lat geen om eS d — Cá Nielson 
ï i i <planations ym $ d 
X B : tart with a list of expla i Hoa uninf and 
Securing his rear at the s i ing treated at length with mathema: Pru aA end 
d, many topics being tre: 3 d 7 Matrix inversi ; 
covers a good deal of groun <ample) briefly but with references. 3 inn 
de me ages, one worked example y : Estne ER 
ae 4 ep I aa oe BA linear equations are done exclusively poe by powers o f 
Les ow i djusting the elements of th X "III anc 
i ade of the requirement for adjus h vieta d 
pe iore Serial The treatment of least squares and data fittin g a satel to 9 
ind p adequately as a text for statistical computation, since it is dire 
W: 


i > 1l 
evaluation of the regression constants. s — E and wel 

To the student, a partieularly valuable feature of the book is the array of c pape 
Jaid out illustrative examples. It is also a useful reference book. 


PUBLICATIONS OF THE U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 


: 31. / 
3 sag Series no. < 
(i) Tables of functions and zeros of functions. Applied Mathematics Serie: 
1954. Pp. ix+211. $2.25. à 


ji motions, 
This is a book containing sixteen divers tables of special functions and zeros of — 2y "They 
chiefly related to Bessel Functions or which have been derived as ancillary to their P p Math. 500- 
have almost entirely appeared previously in the J. Math. æ Phys. and the Bull. Amer. 
[1942-9]. Those in the former journal, being desi 
application, 


but we may mention Table 6, of the 
lating the p.d.f. i 


, 


5 atistic® 
gned as aids to physicists, have ue its tabu- 
Struve functions, which should prove use 


in a anc 
- X^. Tables 7 (of Fourier coefficients), 8 (of Sm adir 
cos w for x= 100( ing logarithms t 


rd table 

s’). Table 11 is, in fact, the now poris AE 
and Levenson. Each table i veded by an introductory essay, with fu = 
bulation of the function and its relations: 

er- 

T n " i " s e det 
(ii) Contributions to the solution of systems of linear equations and th 
mination of eigenvalues. E 


:og Series 
dited by Orea TAusskv. Applied Mathematics 
no. 39. 1954, Pp. 139. $2.00 


ks an 


E solution o 
examples, and A. I. and G. E. 


of the n x n matrix 
the rank of square 


ij-n.,» and table S-t 
matrices, location 
treated, 


of late 


nding Exponential. Applied 
Pp. 76. 56 cents. 


The negative exponential function, e-e 
v= 2-5 toz=10. 


for 
3 .001 
> 18 tabulated to 20 decimal places at intervals of 0 
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Ta " ; : 
EC of fim Cumulative Binomial Probabilities. (Ordnance Corps Pamphlet 
111389). Washington: U.S. Department of Commerce, Office or Technical Services. 


1952. Pp. viii+577. $6.00. 
Through the well-known relation 


n 
SC, p'(1-p)"7=1,(e, n-e+ 1, 
r=C. 
whe: j . " r 
eed "s the incompleto B-function ratio, these tables may be used to give either the cumulative 
ial probabilities or incomplete B-function values. The function is tabled to 7 decimal places for 
p=0-00(0-01) 0-50 and n=1(1) 150. 
Previous tables of this kind include: 
thematics Series 6, 1950) which covered 


Noi —— 
information is given as to the method of computation. 
s Tables of the Incomplete Beta-Function 


"hu the National Bureau of Standards Tables (Applied Mai 
ange n=2(1)49, and were derivable from Karl Pearson’: 


C T 
( aoe University Press, 1933). 
h H. G. Romig's 50-100 Binomial Tables (Wiley, 1953) for which n=50(5) 100. 
© present tables, therefore, cover important fresh ground for n> 100. 


CORRIGENDA 


Biometrika (1955), 42, pp. 531-3 
est for Markoff chains.’ By I. J. GooD 


Mr Leo A. Goodman has been kind enough to allow me to see some work that he has not 


Yet Published in which, among other things, he points out that my paper in Biometrika, 
naecuracies concerning the non-cyclic (non- 


2 H 
Gir a 531-3 contains a number of ir ; 
Cularized) case, In addition he mentions the following errata. 

= 2N log (Nt). 


at (7) should read K, = 2N log N, Ka 
? the last line § 5, Vy? should be replaced by Visa i LJ. 
Mr Goodman's forthcoming paper ‘Simplified 


F 
"es further details the reader is referred to 
P tests for Markoff chains’. 1I 0t 


‘The likelihood ratio t 
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Tables of the Incomplete B-Function 
EDITED BY KARL PEARSON 


59 pages of Introduction and 494 pages of Tables 


Price: 555. net 


Tables of the Incomplete r-Function 


` Epirep BY KARL PEARSON 


31 pages of Introduction and 164 pages of Tables 


Price: 425. net 


mplete Elliptic Integrals 
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ith autographed portrait of LEGENDRE) 
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39 page 
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Il. From Biometrika, Vol. 32, pp. 300-310 
(1) Table of the probabilit 
(2) Table of the Percentage points of the range 
(3) Table of the percentage points of the t-distribution 
Stitched together with introduc 


IV. From Biometrika, Vol, 33, pp. 73-88 


Table of Percentage points of the inverted beta (F) distribution 
With introductory matter. Price 2s, 6d., post free 
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(1) Table of the 
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Price 2s. 6d., post free 
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KARL PEARSON, 1857-1957 
a Centenary Lecture by J. B. S. HarpaxE, delivered at 
University College London on 13 May 1957* 


he birthof Karl Pearson. To me, at 
as born, that I think the world is 


Being 


ate the centenary of t 


We are met here to-day to celebri 
Karl Pearson W 


l , 
fe^ this means that I am glad that 
etter because he was born. 
A greater man than any of us said 


The evil that 
The good is oft interré! 


men do lives after them 
d with their bones. 


And then let us study not only those of 
widely known, but perhaps some 
to the light of day. 

cisely, all power corrupts. It is impos- 
ment, and the editor of an important 
w see that in both capacities 
later turned out to be fruitful. 
which turned out to be 
to say what any one 


me criticisms. 


and culture which are 
e more in 


5" MB begin, therefore, with so 
als etc s contributions to science 
0 which should be disinterred and brought one 
“i, a first stated, and Acton restated more pre 
jours de - professor in charge of an important depart: 
Piar al, without being somewhat corrupted. We can no 
H son made mistakes. He rejected lines of research which 
€ used his own energy and that of his subordinates m research 
much less important than he believed. Jt is, however; very easy 
hc to have done fifty years ago! 
Es this criticism can be, and has 
used a fundamentally false theor, 


& c 
Pe Onsequence his work was not merely use / /"lution, this might hav 
arson become dictator of British research on heredity and evolution, thus might have 


Been true. Fortunately he did not. Į believe that his theory of heredity was incorrect in 
Some fundamental respects So was Columbus’ theory of geography. He set out for China, 
and discovered America But he is not regarded as a failure for pel eae When 3 turn 
to Pearson's erent mar of apers on the mathematical theory of evolution, published in 

e last years of the wn century. I find that the theories of eee ia PU 
[eund accepted are very far from his ow?- But I find that in the search tor à set- 


: " indispensable in an 

Consistent theory of evolution he devised methods which are not il » pn e y 

‘Scussion of evolution. Tl e es in every serious application my 
volution. hey ar j: 


tial DIETAS TER 
Pro ps o the distribution of British incomes, 
blem whatever. If, for example. ribe 


á sc : ; à 
re I wish to de Its of testing materials used in 
sponse of different individuals to & drug: 


or the resu SE 
engin a Jaid in his memoir on ‘Skew variation in 
eeri à 
omo ering, I must start off from t ly take some short cuts 
geneous material’. After six 


he foundation s 
th ty-three yea" m [Se in later years. Very few 
ove the jungle of his formulae some of which i ue 
1 B ¢lantic- 
PS to-day follow Columbus’ course across the re ing the controversy between Pearson 


e ;one re ae : 
tme put the matter in another way- ny on leagues on the other, which reached its 
eldon on one side, and Bateson and his vhich would be unjustified had 


hed much farther. It is said that Pearson 


y of heredity, and therefore of evolution, and that as 
E Jess, but actually retarded progress. Had 


been, pus 


and 
yi v 
qued On the other hand, 


nimsel ae ple in an oration. 
into a form suitable 
Biom.44 


ading, 
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T snow who is right. 
culmination about fifty to fifty-five years ago. might have said 1 » tn tipo x s 
A ide is w '. In fact both were right in e 
putetis pertain Montt past ona = ee: ‘ay. But we can now see that 
eneral theory of Mendelism is, I believe, correct in a broad Ww ay a x derstood it, could 
Picci were completely correct, natural selection, as Pearson unm P dimit, 
r. For the frequency of one gene could never increase at the expens diee 
noe peou tit ling errors. It is just the divergence be à 
except by chance, or as we now put it, sampling pd iius dicar afit 
observed results and theoretical expectations, to which Pearson rightly 
which gives Mendelian genetics their evolutionary importance. . — À 
After this preamble I pass to my main task. Pearson's connexion with this Co annt 
when he was nine years old, and was sent to University College School, where p T icit 
forseven years. He left at sixteen and obtained a scholarship at King's College. a 1879. 
at eighteen. As an undergraduate he studied mathematics and was third wrangler stieni 
He had already shown something of his future mettle by a successful refusal to aa " 
divinity lectures. In spite, or perhaps because, of this independence of spirit he T. un 
fellow of King'sin 1880. He spent about a year in the universities of Heidelberg an -«— 
attending lectures on philosophy and Roman law as well as physies and biology. € nae 
the most striking effect of his German year was to interest him in mediaeval and oe 
sance German literature, especially the development of ideas on religion and the Ye C. 
of women. Atabout this time he began to spell his Christian name with a K instead " n tő 
This may have been a homage to German culture. It may have been a special homi 
Karl Marx, for we know that he later lectured on Marx, and his daughter tells me ghat w is 
in Germany the police once searched his rooms, and he considered that one of Marx’s bo 
was the most subversive of the documents which they found there. i This 
In 1880 he began the study of law in London, and was called to the bar in Tga. ý ad 
may have been a tribute to his father, who was a Q.C., or a means of ensuring a prie etn 
in future, more probably both. He also published his first books, The New Werther hey 
The Trinity, a Nineteenth Century Passion Play. Both were anonymous, and had they 


H n H i n. 
been signed, would certainly have prejudiced their author's chance of appointment in mà y 
institutions, perhaps even 


reaks 
in the Infidel College, which suffers from occasional outbrea 
of respectability. For both attack Christian orthodoxy. 


: a 2-5 ; n, 
It was at this period of his life that he lectured on Marx to small audiences in Londo! 
on the ‘Ethic of Freethought’ at 


South Place, and to the Sunday Lecture Society m 

‘Matter and Soul’, 
In 1884, at the age of 27, he was 
Mechanics in this College. He had 


problems of applied mathematics, 
Common Sense of the Exact 


V 

ting to see that he regarded this as a worthy 

topic of academic study. May I hope that. j 
Professor may comment on Pe. 

He was clearly a very succe 
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Saint-Venant’s work on the theory of elasticity, 


and wrote $ 
Ere rase ang E ot arca istory of the Theory of Elasticity. His radical 
devoted to ‘the n : - ie joined The men’s and women’s club’, a small body 
nitus hastis a i ed discussion of all matters in any way connected with . 
Chosen pe iran " ane relation of men and women’. As, in The Ethic of Freethought, 
nob mirrising d e view that unmarried women should be allowed sexual freedom, it is 
HET nee i v egends arose, and still exist, as to this club.* In fact Karl Pearson 
human sexual ER * iss Sharpe. To-day it is quite normal for a couple to discuss 
Improper, and ail kinds of sisse omar P eon d 2 rod — ls sega dell as grossly 
RE ME E Hin ib a ations were made against those who did so. I have not the 
att hsfan : in net the male members of the club were far less promiscuous than 
as degradi pe emporaries. If to-day association with prostitutes is generally regarded 
ding, while seventy years condoned and not rarely approved, 


We owe i ago it was generally 
we series Pea 
it largely to men like Karl Pearson. 
lection of lectures and essays, 


The . 5 . 
^e Ethic of Freethought was published in 1888, and is à co 
Tt is, in essence, a religious book. Pearson 


som PP 

E w hich had been reprinted as pamphlets. 

Byater religion as ‘the relation of the finite to the infinite’. ‘Hence’, he continued, ‘all 
ystems of religion are of necessity half truths.’ The most scholarly part of the book deals 


With n 1 
ni the history of religious systems, particularly in Germany. He believed that such a 
y was part of the duty of an educate I read a few sentences. “By 


studvi d man or woman. 
ying the past I do not mean reading ] work, but taking à hundred, 


or better fi a popular historica i 
of us is r fifty, years in the life of a nation, and studying thoroughly that period. Each one 
ae capable of such a study, though it may require the leisure moments, not of weeks, 
Pi. years. It means understanding. not only the politics of that nation during those 
; not only what its thinkers wrote; not only how 


liv the educated classes thought and 
ed; but in addition how the mass of the folk struggled. and what aroused their feeling 
er respect more may 


ax ; 
x stirred them to action. In this latti þe learnt from folk-songs and 
" s à A 
"pes than from a whole round of foreign campaigns." " 
he book is largely a record of its author's search for truth among religious systems. 
i introduction of that 


n 
e chapter is devoted to the mystic Eckehart, and was the first m 
" publie. Of all the systems examined there can be no 


Temar 
ds e thinker to the British pu 
di t that that of Spinoza appealed most deeply to Pearson; and he devoted another 
a x pter to demonstrating Spinoza's debt to Maimon: f I may be allowed viro 
mo which is in no sense à criticism, it is that s i tance s a 2 
wo Osophy was confined to translations of Hinayana Buddhist serito. I think t hat he 
k: uld have recognized more kindred spirits in such ancient Hindu thinkers as Yajnaval- 
> tp the great anonymous humanist whose words are preserved in the first section of 
1 eerie Upanishad. , 
niv i a little surprising that the title page does not mention 
Zot eid College. Perhaps his senior colleag 
If im into trouble. 
» in 1890, one had had to pass 
* 
latio 
O; 


man’, 
Preet) n’s Question" (1885) an! 
tought. Ed.] 


to stu 
ident y i 
ents of engineering. He edited de 


ides. I 
Pearson's acquain 


r's professorship at 


the autho: 
ould have 


t such & mention W 


à Pearson. it might have rum as follows, 


judgement or 
ure of the ideal re- 


< ne side of the pict 
o fix on epo 


[As mi 1 

| might be expected the critics tended t d in his lecture 
ns xpectec. wee A rson elaborate E 
hip between the sexes in & socialist state — (1880). afterwards published in The Ethic of 


dÅ ‘Socialism am 


only 9 
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vork of 
i rly iler of the work o: 
i thematies, and a scholarly compi 2 : 
S Son s * S em obi deredoro and art most unusual in à gore 
M va e wi 
more original men. He has a kn Mese Seir ums 
P ; hat of a radical, but he is only thirty shed 
of mathematics. He is somew M d 
le and useful member of society, an y ce 
settle down as a respectab ik of Hiab orisimalitv, but ilis Go 
i i i "pr / ginality, 4 
i ives to sixty. He will never produce work of gr t delen 
s w hls sorry to have appointed him.’ Had this judgement been correct, we s 
ne 
a "son's future 
pc i events occurred which, in my opinion, shaped the sanam of gi nh e 
life. He applied for, and received, the lectureship in Geometry at Gosche mune. i 
W. F. R. Weldon succeeded Lankester in the chair of zoology at University aod 
Gresham College he could lecture on what he pleased. His first set of — baec 
into T'he Grammar of Science, his main contribution to philosophy. Later ser x atento 
"The Geometry of Statistics’, and “The Laws of Chance’. But since bod - e 
probability and statistical method in the first edition of T'he Grammar of Science 


š s M tainly 
ficial, we may take it that in 1891 he had not considered the subject seriously. > ds pne 
did so in later years. I have little doubt that the stimulus to do so came larg 
Weldon. 


rial 

; . . at mate 

The Grammar of Science is a very remarkable book. Pearson claimed th ur dinde 
dios ns ae ó 

objects were merely a conceptual shorthand used to describe regularities in 


; ig in terms 0 
impressions. This idea is hard to develop, if only because our language 18 1n ome self- 
material objects such as eyes and brains. He did not in fact develop it without s 


; ess 
contradiction, at least on the verbal level. But he did 80, in my opinion, with cit 
self-contradiction than contemporaries such as Mach and Avenarius. He must be reg 
as one of the founders of the important school 

I can well remember the im 
1909. If it is less im 


have changed profoundly, a fact which would 


author. I do not personally think that Pearso 
theless, a man who first states an important 


ee "ge ips in 
and Empirio-criticism. This was an attack on people who, gome- 
words, or rather those of his translator, ‘under the guise of Marxism were offering 

thing incredibly muddled, confused and reactionary’. 


Now Lenin disagreed strongly with Pearson, and claimed, in my opinion correctly, r than 
found self-contradictions in his arguments. Nevertheless, he found him vastly cleare on, 88 
other Machians. Let me read a few of Lenin’s sentences. ‘The philosophy of Pears’ 


* ‘The 
; excels that of Mach in integrity and consistency’ (p. 119)- 


ig the 
: : TER s sepan dB 
» expresses himself with characteristic precision, “Ma 
creator of natural law: * 
conscientious and 


to have 
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treatment j m 
nt of the subject. The only other contemporary British opponent of materialism t 

H fi 
polite was James Ward. I cannot help thinking of Tite's 


whom Lenin was equally 
om heat, cold, 


treat; i 
ment of Saladin, who was, of course, in hell, but so far from suffering fr 
a noble castle. Whatever may be the fate of Pearson's 


or other torments, was housed in 
d of attentive reading in 


imn in re own country, The Grammar of Science is assure! 
‘isa a s where Leninism is orthodox. 

I Est ad dii Pearson's own views, I quote three sentences from the 
unity of all ae E the strength and the weakness of Pearson's approach to science. ‘The 
Bison tates - oe alone in its method not in its material.’ ‘No physicist ever saw 
seals vnde n = Atom and molecule are intellectual conceptions by aid of which 
The steers E if p pM and formulate the relationship between their sequences.’ 
Saou’ and hown y the fact that the distributions, which Pearson worked out to 
skies ut passa ie of populations of crabs, will equally well serve to describe 
hien m $ stars, manufactured goods, durations of life, incomes, barometer readings, 
Sien RAA ^ weakness is shown by the fact that physicists have, during this century. 
rapidly. Denn imm. or rather atomic nuclei, by the tracks which they make when moving 
Teens, rar T s philosophy discouraged him from looking too far behind phenomena. 
D 'rengury of = , for this reason, that he never accepted Mendelian genetics, although the 

uman Inheritance and his own monograph on albinism contain plenty of 


po in its favour. 
Ye rari of Gresham Lectures d 
ites = aphical methods of representing 
is ps as a result of the questions 
the em. m University College. But his 
in the pj Series of memoirs on the Mathem 
hilosophical Transactions of the Royal Society 


too 
oe to say that the subsequent developments of mat 
i on Pearson's work between 1893 and 1903. Perhaps we shall be helped to estimate 


^ ci by an exercise in hypotheties. What would have been the effect on Pearson 
have LU obtained the Jodrell Chair of Zoology in place of Weldon ? And what would 
is no een the effect had our College contained an economist OF engineer a tm in what 
vari Mg called Quality Control? Although Bateson was as interested Us We jdn in animal 
and row he was more concerned with exceptions, and with discontinuous, or as pepe: 
E called it in 1899, exclusive inheritance. I doubt if ae wer wis 
Wéuld a in a form which would have aroused Pears I or 2 Loa ii 
Teadi, probably have discovered what is now calle 1 p 1 ; 1 vs 

ng Mendel’s paper, did not realize the necessity O h large samples, wien 


Carson certainly did. 
Eod PEO ot or techno 
Nave aa have had to deal, as / 
or frau: correlation to measure the lik 

ine as he in fact used it to meas 


Grammar, which 


and probability. particularly 
ve no doubt that they were 
ing to him soon after 
ons is to be found in 


ealt with statistics 
distributions. I ha 
which Weldon began putt 


full answer to these questi 
atical Theory of Evolution which were published 
between 1893 and 1900. It is not 
hematical statistics are largely 


on's interest. 
d Mendelism. 
f dealing wit 


ation of manufactured 
a. He would presumably 
ts of the same craftsman 
e children of the same 
ka not wholly unlike 
t, of the statistical 
96 or so 


interested him in the vari 

he did. with skew variatior 

eness between the produc 
between th: 


Ta ure the likeness $ 
Ti : 
"edt Perhaps in 1901 he might have founded D 
rw He would almost certainly have invented P Ju ie sen 
o a now used in industrial quality contro : n pr p Lex 
© industrial productivity of Britain in the early ye y. 


logist had 


ars O: 
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The papers to which I refer are hard to read because Pearson aa = hermes 2 " 
lgebraical and arithmetical methods which are now seen to be needlessly a N 
"s m have since been simplified. As a humble tribute to Pearson I have, as pone 
implied the first of them, which deals with the dissection of a skew frequency ar d 
into two normal distributions. By an elementary transformation I have thrown his 
formidable nonie equation into a form w 
tabulation is now under weigh in the ele 


Institute. I hope that as a result, the metho 
cious than Karl Pearson. 


E . this 
hich allows numerical tabulation, and visa 
ctronie laboratory of the Indian p 
d will be available to statisticians less pert 


Commenting on a particular passage in (I think 
musical critic remarked ‘Hier ist Ti 
Pearson attacked Olympus by 
If we, his successors, have mad 
mathematics are no longer used 
of an intellectual titan. 


) one of Beethoven’s works, a "a 
tanenthum Pflicht’. (Here titanicity is a duty). ari 
piling Ossa on Pelion rather than by seeking an € n 
e statistical theory relatively easy, and much of F we 
, we should remember that we are treading in the footster 


of obtaining the observed sample. This method was davai a 
Edgeworth as ‘the method of maximum credibility’, and by Fisher as ‘the meiha € 
maximum livelihood’, In the Succeeding paper, with Filon, Pearson developed it rei 
Critics have asked why he did not generalize it. I think one possible answer is as fo The 
The expression ‘the best’ is unfortunately seldom applicable to statistical estimates. 


; this. 
best for one purpose is not usually the best for another. I think Pearson realized 
Some of his Successors have not, 


In 1900 Pearson attacked the problem of eu 
curve to a series of data, for example, the num 
intervals such as 70-71 in., he asked what 
population truly represented by his curve should fit i 


: But the 
s approximation for some purposes. 
question arises ‘what is a bad fit?’ Is 38 a wor 


(It is not!) An 
arson solved th: 
which incre 


so 
e-fitting, and it is very mue, 
Ypothesis wherever the hypothesis 18 te 


to 
5 carson pointed out that it might be M 
discover whether a, number of sets 


'eement with h 
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more from 17 % than could rea 
his $s m a used as a test o genei 
Trde Tadei Z commonplace with great human achievements. pee ra 
Sieh af mica pm € carts, and 8o on. But to-day most wheels are used, not for the 
are salar ms - or power transmission. Perhaps the majority of wheels in England 
cummins sa we t was absolutely characteristic of Karl Pearson that his intellectual 
Ta. il T de ten extremely general. He obtained a solution of a problem which was 
But is I , y at it had entirely unexpected applications. 

"hs Dicha ex y reason it often had a limited applicability to the problem for which it 
artificial ai d, s co In the last few years many experiments have been done on 
rents durae o» : SPUMA characters, particularly in Drosophila and mice. Their 
But after mne ie vii generations are often much as Pearson would have expected. 
of the ede emi ks : er ^ : m qoem In spite of this they are best described by the use 
desoribing x a i ools which Pearson first applied to such problems, that is to say by 
ici ko 4 m in the moments of character distributions, and simple functions of them 
Wilh e sida ard deviations and correlations. One can only defeat Pearson intellectually 
i ' weapons which he himself forged. If I may be allowed to quote William Blake, * 


Pea: 
rsson’ : . P 
n’s main service to humanity was 


In all his ancient strength to form the golden armour of science 


For intellectual war. 

t only to use data collected by Galton and others, on 
Among the important biological results of 
herited both in our own species and 
his measurements of the same 
ver changed their length 


Meis the same time he began no 
this le other animals, but to collect his own. mo 
i fino ries were the demonstration that fertility is inl 
arn. As an example of his thoroughness I mention 
S BE an bones after various periods of wetting and drying, which ne 
It Fa as 1%, though they did change it. | 
Man hg 8 probably through Weldon that he came od Galton. This very remarkable 
Psyc} ad, among other things, invented the recognition of criminals by finger prints, and 
Mu es (as may be seen from pages 185-207 of Inquiries into Human Faculty). 
. €h Of Pearson's work in the '00's was 2 development of the notions used by Galton in 
there is no reason to think that Pearson's one 
wed anything to Galton. This is described in 
902 alleged to be on the mathematical theory of 
a series of measurements made on the same 
peria] by Dr Alice Lee, Dr Udny Yule, and Pearson, and by Lee, Dr Macdonell, and 
i t acteristic bias and a characteristic spread 
ted was the discovery that the errors made 
h thetically. In fact in one series, Lee and 
, edo 2 a à He attributed this to 
e JEn: CA at ag m ttributed it to telepathy. 
i sfide E n ith Lee. Bramley d Beeton on the 
ty. 1 asit say more for the value of this work 
i eory which I published in 1949, 


hich to base & th 


* Pala, or The Four Zoas (End of last Night). 
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; ; : zhi arson found surprising at the 
and which I venture to con i xe Civi harii n anemia - 
i urse, he , a oy: 
E arie e oe of Biometrika was published, partly no nae oer cani 
Society, although it had awarded him its fellowship in 1896 and its Dar 1 K fhe saiit 
objected to publishing advances in mathematical and psici 2 s rim to be the 
paper! Biometrika has not fulfilled what Galton, in its fir st numi E vacio a which 
primary object of biometry, namely 'the discovery of incipient changes apes ent 
are too small to be otherwise apparent’. The reason for this failure is Simp e.T bsc dime 
of increase in tooth length during the evolution of the horse since the Eocene : cpi die: 
to have been about 4% per million years. Such evolution could not be detectec i N 
lifetime. But the aims stated in the editorial introduction, presumably the pu ime 
Pearson, Weldon and Davenport, were fulfilled. In particular the first sighs : T He 
a paper by Weldon on variation in snails whose importance he did not live to a e 
found that natural selection in a snail species weeded out extremes, reducing the : us this 
deviation of a metrical character without affecting the mean. We now know em Am 

centripetal selection is very common. Had Weldon lived longer he would jerune "d 
discovered this, and the whole history of biometry would have been very different. ER 
In 1903 Pearson's Department received a grant of £500 from the Drapers’ men rill 
these grants, at the rate of £500 per year, continued till 1932. In 1903 this sum W » i 
about £3000 or more to-day, and went partly in the payment of Dr Lee and othe 
puters, partly for instruments, and partly for printing. «4 not either 
Thave no idea how Pearson obtained this money. We may be sure that he did p didt 
flatter rich men or promise to improve the national health and intelligence in their lite up 
Perhaps Galton had the ear of some rich acquaintances. Perhaps too, at that yt ji o 
ruling classes were less permeated than now with the ferocious contempt for the purs! 


'« sermon O 
knowledge for its own sake, which is voiced in the Arehbishop of Canterbury's serm 
March 24, 1957. To-da 


ic, 

y it is not hard to get money for research which may have pam 

military or hygienie advantages. It is extremely hard to do so for the mere search for bi 

About this time Pearson began the series of papers on human biology for which he ER 
known in some quarters, and the majority of which, I think, were joint work. Even 


+ n i w 
his name did not appear on papers, I think our chairman will agree that nothing 
published from his laboratory 


H H H r1 t leas 
without his imprimatur, and some of such work must 2 
briefly be considered here. 


le. 

a xamp 

Many of these papers are as fresh to-day as when they were written. To take an eX ede 

n C1 B H t : rS 

in my opinion nothing since written on human craniometry has in anyway Supe e, 0 
i me, 

Some of this work was, at the time» . 


P ig stl 
particularly of the Treasury of Human Inheritance. This 18 
indispensable. Neverthele b 


ee 
 Jgq—————— 
" a 
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prove 7 i izati i 
the value of immunization to diseases. These attacks were fully justified Mental 


defect i ‘tainly i 
is certainly not a Mendelian character. As Pearson and Jaederholm showed, the 


distributi i i i i 
ution of intelligence quotients in defectives is the tail of a nearly normal frequency 
n say a great deal more about it. We can say. 


distributi " 
ioc Forty-three years later we ca 
r ex ia i 
diesen that phenylketonuria 1s à chemically definable character inherited as a 
eek y recessive, and accounting for perhaps 1% of certifiable mental defect. But the 
al defect of phenylketonuries is graded, and a few of them are stupid, but not 


suffici rs 4 

po so to be classed as feeble-minded. In fact the diagnosis of phenylketonuria 

differe enrose to dissect the distribution of human intelligence quotients into two very 
nt but still overlapping distributions. The notion that such a dissection is possible 


Was Pearson’ 2 = $ 
earson's first contribution to biometry. 


ties. I think 


ds for the en 

At last he W 
s, and in 
1914 th 


tion in Scotland are well 
which involved much 


not i 3 
deteriorated. It may have done so as regards its ‘nature’ or inborn capaci 
breedi 
eeding more slowly than those a little below the median. It is, however, easy to be wise 
it : m : 
would have improved more if, say, & million children who were born to unskilled labourers 
computation he played a large ]l shown by the fact that ‘pirated’ 
Red f the electronic compu 
ede: 3 
ed them. They were published from 1914 
rear, Were his last, but not his least, 
e 
omputer as efficiently as Pearson used his teams of devoted, p 
Pear 
earson became the first occupant. 
the next year the present laboratory was 
to the } . " : 
hosp: 2 i toit 4 
pital, and he did not getm tion of trajectories of 
arti 
ttillery, When in 1920 the Departmen 
ther things, to develop à 
bined biometrical and 
histo " 
rical research. He was able to measur : 
ar à : ; he course of this work he played the 
nd compare them with co! 4s. In the cours 
he murder o 
Wo z ; 
rth reading. To the same peri 
. If he could hear it I 


Agai " A ; A 
gain, one series of memoirs was entitled Studies in national deterioration. This is à 
that if W E 
if Weldon had lived Pearson would have realized the ubiquity of centripetal selection, 
after fi 
er the event. Moreover, Pearson and his colleagues were completely right in one respect. 
ha : 
d been born to skilled workers. teachers, and the like. 
editi part. Their utility was we 
itions of them were soon published in Ameri development of statistics 
B e Tables of the Incomplete 
eta-function, published in Pearson's seventy-eighth y 
ed how to use an electronic 
ac 
curate, lady assistants. d ANS , 
dowment of a Chair of Eugenics, of which 
m z P 
athematics to engineers and physicist 
ough it was commandeered as an annexe 
lit 
tle but war work, first for the Board of Tr 
, d, he was 
Six: 
ty-three years old, and he had, among ° à 
rs which com 
raits. 
detect; d husband of Queen Mary 
ctive, and reconstructed t ! 
ts on the hist 
Istorical research. 


polemical ti "m 
ical title. And it is a fact that as regards most measurable characters the nation has 
an ; 
d that in fact both the most successful and the least successful members of society were 
ven if, in spite of his predictions, the nation has improved in some measurable directions, 
No such criticism is possible of the mathematical tables which he edited, and in whose 
i ca. The subsequent 
8 lar 
largely based on them. Even the advent 0 ter has not yet super- 
to 1934, and th 
contribution to science. It appears that no one has yet discover 
instaking, and remarkably 
In 1911 Galton died, and left fun : i 2 
r as able to give up the teaching of applied 
begur 
i. Fortunately it w leted b 
unately it was comp y From 1914 till 1919 he did very 
tof Applied 
new course of lectures 
and : 
pra 3: a se ae 2 
ctical work, In 1923 he Hage ET Js of & number of distinguished men 
mporar, or 
ntemp' y P Lord Darnley, secon 
of S 
Scotland; and his commen A 
od belongs his 
I 
have devoted this lect 


ure entirely t 
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believe his main criticism would be that I have said too little about his fellow workers ; for 
much of his work was in collaboration. He had a wonderful gift for inspiring loyalty in m 
colleagues, of which more will be said to-day by others, T myself only met him frequen «i 
in the last years of his life, and can merely say that he was most gracious to me, es 
my outlook on many biological questions Was very different from his own. He resigned 
Chair in 1933, but published one book and at least three scientific papers before his death 
in 1936. 

It remains to Say a few words about wh, 


at Samuel Butler would have called his life after 
death, the results which are stil] accruin 


H soe [4 i ri Ls ll 
g from his original thought. To begin with, A 
he foundations which he laid. If we sometime 


'arried on by three professors.* Under his son the 
ading teaching department, in that subject im 

5, to take only one example of its work, continue his 
father's great tradition, Prof. Fisher, who succeeded him as Professor of Eugenics, did NONE 
great services to statistics in simplifying and rendering more accurate a number of statistical 


m moir 
uch to Pearson’s great mer 


E me a ñi "s 
ultural experiment an exact art. To mention only one of F isher 


My own department of biometry has not bee; 


r 4 n so fortunate, [ was Professor of Genetics 
= n : Pres m have accepted the Weldon Chairhad I not been promised accommoda- 
i lometrical work, Owing to the war and for i i not 
> other reasons, this promise was 
— have been unable to Carry out the duties of this chair adequately m my opinion the 
| pu lometrie work twenty years has been Carried out b Teissier and Schreider 
in France, and. by Mahalanobis and his colleagues in India, In afi a tl 
been able to start new lines of biometri i : £ 
rical research, Į think i : the work © 
10pe in press, in Which he ur ue 
Pearson left it in 1993. A study 


of the last 


ould also include 
Engineering. 


——— a, 
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acteristic manner during a season. I believe that 
ter in India than in Britain, and for 
to migrate there. To quote Karl 


tions, and that both of these alter in a char 
the opportunities for Biometric research are now bet 
this reason among others I have thought it my duty 


Pearson’s most loved poet 
In Vishnu-land what avatar? 


Whatever the fate of Pearsonian biometry in Britain, I believe that it will live and flower 


in India. 

To me at least there seems to be an element of hypocrisy about the present celebrations. 
I believe that we should be honouring Pearson more effectively if, to take one example out 
of many possible, we ensured that the College Library possessed copies of all his works, 
placed where students could consult them, than by making speeches and eating food. 
I mention this particular example as I have been trying, without the faintest success, to 
secure such accessibility for at least ten years. 

Pearson's work for free thought and the emancipation of women 
this country, if not always as quickly so as he hoped seventy years à 
is now taught in this College. His work for socialism has not been as successful here a 
hoped. Nor would he have approved of many features of the socialistic systems of the 
Soviet Union and China. Here again I believe his real heirs are to be found in India, ee 
the editor of Sankh ya, the Indian Journal of Statistics, is also the principal planner of the 
Approach to socialism under the second Five-Year Plan. 

I fully realize that I have not done justice to my subject. The task i E hee: is 
impossible one. No one man now alive could do justice to the breadth. of ree i m 
interests and achievements. But I thank you for joining with me in cele g 


memory of this great man. 


has been successful in 
ago. The history of art 
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AN ANALYSIS OF THE DATA FOR SOME EXPERIMENTS 
CARRIED OUT BY GAUSE WITH POPULATIONS OF THE 
PROTOZOA, PARAMECIUM AURELIA AND 
PARAMECIUM CAUDATUM 


By P. H. LESLIE 
Bureau of Animal Population, Department of Zoological Field Studies, Oxford 


1. INTRODUCTION 


" = " t- 
In his book, The Struggle for Existence, Gause (1934, pp. 96 et seq.) describes a very interes 
ing set of experimen 


aurelia and P. cauda 


* e 
comparable Stochastic processes at s 
k E s it 
ite variance due to the sampling of eac 


*- 


P. H. LESLIE 315 


mix " 

ÉD b total numbers thereafter increased until there might be around 

aei diu ie s in the case of some cultures. There was, however, a considerable 

wee er oe the replicates at each of the successive sample censuses. 

Mops scere : ealing here with aset of experimental populationsin which the numbers 

ml, e arly small. We might assume, therefore, as a first approximation that the 

eius he mean values of these stochastic processes could be described adequately in 
of a simple deterministic model; and the object of the present paper is to examine 


ti : A 
hese data in the light of this hypothesis. 


2. MATHEMATICAL MODEL 
ll be used in the analysis of these data is the familiar one 


T P 
he mathematical model which wi 
32) and Volterra (1926, 1931). Suppose we 


e bin the names of Lotka (1925, 19 
respeotivel dic gn p of living organom S, and ne populations consist 
living td AN, and 2 individuals at any 81V en moment. either of these species were 
iste t e, under some constant set of physical conditions, such as the temperature, 
NE Oae the hydrogen ion concentration of the medium, and so forth and no 
fondi $ whatsoever were placed upon its increase in numbers; and if the amount of 

particular source of food supply is unlimited; then we assume that the species 


Wi i 
ould increase in numbers at a rate defined by 
dN 
aN _ yy = (b-@N, 
dt 
enting the difference between the ‘birth-rate 5 


whe : ae 
ie rs ris the intrinsic rate of increase. repres 
Y $ . 
pta death-rate’, d, of the species under t 
spp ever, when some source of food supply is 
Dpri "TE " š : 
divi oach to this intrinsic rate of increase WOU à i 
» iduals is relatively small and we suppose that any increase in numbers will tend to have 
" : err 
E adverse effect on the relative rate of increase of the population. Hence, when limitations 
ither of space, and/or food supply, are p increase in numbers, we have in 
gene ware pasa 
ral for a particular species living alone 
N 
dN _ RN), 


Eu 


N dt 
e condition, 


dF , 0, 


aN 
n which fulfils these conditions is evidently 


he given conditions. In a finite environment, 
assumed to remain constant in amount, an 
ld only be made when the number of in- 


Wher 
ere the function F is subject to th 


mplest functio 


F( N)=!"- aN, . . 
he well-known logistic differential equation 


A A 
S a first approximation, the si 


that we have t 


dN _ (y -aN) 
" dt 


Whe 
re ai T 
ais a positive parameter; 80 


Whe: 

nce, by integration, ü 
saa the system. 

Where K = r[a, and C is & constant defining the initial state a EY 
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When the two species 5, and S, are competing together in a limited iar 
common food supply remaining constant in amount, we suppose that any in cie 
numbers of the species S, will tend to have an adverse effect, not only on its ici eam 
rate of increase in numbers, but also on that of the species S,; and, similarly any " gei 
in numbers of S, will have an adverse effect both on its own relative rate of increase a 
on that of the species 5j. We have, therefore. the perfectly 
the interaction between two competing species, under some 
environmental conditions, 


i ee 

general equations "ET 
si d 

constant set of physical an 


dN, — MTS AS 
Mai = F(N, N»), 


IN AE pp 
Ndi = FN, No), 
where the functions F, and F, are subject to the conditions, 
oF, oF, 
SADNE AD 
anf"? ax e 
oN, oN, 


For our present purpose it is not necessary to discuss the 
of equations. Clearly, by an extension of the ar, 
living alone, the simplest form: 
a set linear in the variables N 


properties of this general ee 
gument used above for the case of a speci 


; na is. 
8 of the functions F, and F, which fulfil these conditions, i 


and N,. We write, therefore, 
dN, 

Pri = (=a, N,- 5,N)N, 

dN, 

E = (rgs—a, N- ba Ni) Ns, 
where 7, and 4; are the lo 
similarly 7, and Ay 
the degree to whic 


gistic parameters for the Species S,, if it were living alone, and 
those for the Species S,; while the positive 


h each species affects th 


wi u of 
n © was carefully sti d a sample 
0:5 c.c. (one-tenth of the total volu id stirred an 


nae 
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of every 24hr. of—on the average—one-tenth of the total number of individuals present 


at that time. 

If we define n, as the number of individu 
esti é indivi i i i 

timated total number of individuals in each culture, at the time it was removed from the 


als counted in the sample of 0-5 c.c., then the 


ineubator, is 

J N, = 10m. 
After renewal of the medium, each culture when it was returned to the incubator therefore 
ained, at the beginning of the interval t to#+1, - 


N; = 09N, 


on the average cont 


ing alone, that during this interval 


individuals. W j , , 
dividuals. Weassume, in the case of each species when liv: 
to the logistic differential 


t ` a fh : 
to t+1, the population was growing 1n numbers according 


equation 
dN 


pe (r—aN)N, (3:1) 


ànd we require estimates of the parameters 7 and a, given the values of n, at successive 


Intervals of time. 
From (3:1) we have by integration 


K > " 
He-T1:í,4 U th 3 
whence (K-M)/M= Ce (3:3) 
T" que = 
at time £4-1 Mua = [4 Gen)" 


Substituting (3-3) in this last equation, and writing 


PEL 
the following equ 
+1 with the num 


ation relating the number of in- 


We s 
have, after a little rearrangement, per at the beginning, 


lviduals at the end of the interval t to t 


Na = TaN’ 
Where a= (A - DIE ore 
i indivi the beginning of eac interval as 
Since ber of individuals at oe : 
bs number ® the sample discarded 
E e have defined the num Perl sra population in D 
aft 0-94, owing to the destruction of & © moe) 
er counti , , successive intervals, 
nting, we have for the , M" 
aay 
ai tarted. From (3:4) 
; vere started. . 
vire i is th ber of individuals with which the cultures V 
s 
We haya" e number © gem lk. 
Nl +30” 
Na hence the parameters 


" ad 
| determined, an 
biens relationship from which A and & can be 
T K = rja of the logistic equation. -— 
t will be noted that (3:4) can also be W p 
1 


m 
Na es Ve 
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"i 7 i = if we work in terms 
where A’ = 0-94 and g’ = 0-9a; and AN, = N4/0-9. Thus, since N, = 10n, if we wor 


i is is not 
of the numbers recorded by Gause, each multiplied by a factor of ten (although a ae 
really necessary), the estimated total numbers of individuals in the population, allo 
for the activities of the observer, also follow a logistic curve defined by 


x’ TAE (3:6) 
N= Troe K =r) 


provided we adjust the initial num 


bers by dividing them by a factor of 0-9. The relationship 
between K’, r’ and the K, r of ( 


3-2) is given by 


T vr _ (0-9e"— 1) K 
r =r+log,(0-9), K'— 0:9(e— 1) : 


Thus, in the case of thes 
living alone, two logisti 
(3:2) which in term 


n each species of Paramecium is 
- There is, in the first place, equation 
verage change in numbers of ap 
any way; and secondly, we have 


"ver, in SO 
pulation and the observer, in 


, transforming the estimated 
one into the other whenever necessary. 


4. ExPERIMENTS WITH 
The data for 


each surviving replicate ia 

€ experiments. From day t 

the individual replicates, us 

P. 99): * A separat, i number of individuals in every 

and we began to take average samples 

Working from an origi time, therefore, we have the following 
any estimates of the variance between replicates. 


from similar cultures,’ 
information for making 


Deas us 
Type of culture Nö: of | 
| replicates | Days 
| 
| — e e —-— ee = | 
| Ramis, | od s 
a alone | 4 2-13 
| 3 14-19 


| 
$ 
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The value g e tw pecies were es ater 
s of the para ters i isti S 'el 1m: 
ameters in the logistic equations for the two i t ted 
mean values (94) of each set of replica es, as giv en by Gause i i : 
from the tes, byG in his table, by means 


of the equation (3-5), viz. 


where N, = 107 TM 

Be]: wd e ag N E Of for t = 2,3,4,...,25. No census apparently was taken at 

Sito fn seg pad ain in both cases, and in order to retain this information in the calcu- 

fia whos m 4(20/N;) was taken as an estimate of N//N,. This is quite a close aj i- 
n N and N, are small compared with the upper asymptote in numbers. Writing 


hack NiNaa =Y: N, =2; 
a ; . 
ve the linear relationships between these variables 
E (y- g) = (2-9). 
ere is S 
NN E 2 a question as to the best method of estimating the parameter b = «JA. We 
Which we um ing, in the usual way, y as & dependent and a as an independent variable, for 
quire an estimate of the regression coefficient b of y on a. Moreover, both these 
a method suggested by 


variabl 
es ar C z 5 
e subject to error. Actually, in the present instances, 


Rhode, 
s Diae: 
(1940) for estimating the parameters of a logistic by means of a very similar type 
lue of the parameter b was taken to be 


of li x : 
near relationship was adopted, and the va 
in) 
b= a]? 
X(v—&) 
om day 0 until day 25. As a result the following 


usin : 
g the entire data recorded by Gause fr 


P. caudatum 


estimat 
es of the parameters À and a were obtained. 
= = E ; 
P. aurelia 2-4905 | 0:00026195 
2-2042 | 0:00058189 


ters A an rted in equation (3:4), 
indiy; duals in each case, the expected total number of 
a at time t, and hence of the number in the sample removed and discarded at 
din Tie were calculated by means of à repeated application of the equation. (Since no 
: RAM was taken at t = 1, we have Maun both cases-) The expected mean number of 
e, uals (%) in the pe le of 0-50.06. are : in Table 1, together with the observed 
Bae numbers from t 2 ES £219 je calcul topped at this point because this 
8 th FED E ; replicates were recorded, and also 

b e last Mrd ; vidual replio® , 
Bay wel neun numbers for the theoretical curves was 


Cause ; 
€ it was evident that the upper asy! 


The 
and i numerical values of the parame d a were then mse 
F rting at ¢ = 0 with Ni- 20 indivi 


by comparing the 


taj 
beat by this time. d can be made 
of the ‘ ; leulate curves E 
i goodness-of-fit es pes 3 and 19 with the mean square deviation 
eighteen observations 


Doo 
etween days 
e b pay in each case; 


]. There are, 
Biom. 44 


ed j 

In ° . 

Steen ter-replicate varianc 
expected and observe 


21 


din Table 
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from which the latter can be calculated, and the inter-replicate i — 
respectively, on 36 and 48 degrees of freedom. If we were to apply the usua dean pir 
test of significance, then taking as the number of degrees of freedom, Hos : : ! al wie 
or 48, the 5 % points of F would be approximately 1-89 and 1:81. But, it is no : ieri 
whether we would be justified in applying this test to the present data. Although rim of 
cessive daily samples withdrawn from each population are independent, the SUN nas 
individuals in a particular population observed over a period of time are likely to be m ne 
correlated. A replicate, for instance, which is greater than the mean at some given Mimi: : 
likely also to be greater than the mean on the next day. Leaving aside this point, howe e"; 
a rough test of the goodness-of-fit in the present examples would be to regard as unsa 
factory any ratio between these variances which was greater than two. : à quite 
Because the magnitude of the figures recorded for the individual replicates differe g z 
considerably, not only between themselves at some given time, but also as the aiat 
increased in numbers, the calculations Were carried out in terms of the logarithms (to a 
base 10) of the numbers observed. Estimating the pooled inter-replicate variance In 


3 2 were 
usual way, by eliminating the variance between days, the following values of s* W 
obtained for the period of time 2-19;* 


| 
| D.F. a | 
“ape 
P. aurelia . | 36 |  goosges 
P. caudatum | 48 | 
| | 


| 
001657 | 
| 


e. 
Thus, since in Table 1 them 


ean numbers of P. aurelia 
we should expect the mean va: 


" cates: 
are each based on three replica 
riance between log,, expec 


à the 
re based on four replicates, eder 

3 aro 
Xpect a mean variance of ar 


A 


y inter-replicat, e 
the corrected y? was for P. aurelia, 15-70 plioa l 
nclude 
take 


* ere 
the time order, 1, 2,3, ..., 18, in which they 9-61. 


0 
vas 0-42, and for P. caudatum, T = +0? < 
are significant (P = 9-02-0-01, and P <0-001, respectively)- The 
ing in terms of | ; 


: tion 
810%, the variance between replicates is some func 

e of the pooled in 
the ‘goodness-of-fit’ 


ure 
ter-replicate variance, as & y» only 
ts served, could © misleading. However, as we sha! he 
anos. Je mean square deviation between expected and observed over res 
jection is Perhaps not very serious, Any agreement between the two fig" 
Te were any bias in the fitting, the degree of this bias was not very great- 


same time period, this ob 
would suggest that, if the 
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we hav r 
e the following mean square deviations, EA?/18, contrasted with those expected 


fro i 
m the inter-replicate variances. 


XA?/18 | Expected s* 
= | 
P. aurelia 0-00246 0-00194 
0-02738 0-00460 
| 


| 
| P. caudatum | 


Table 1. The observed and expected mean number of individuals (Tu) in 0-8 c.c. of 
culture for populations of Paramecium aurelia and P. caudatum living alone 


P. aurelia P. caudatum ` 
Time 
m  —— - 
(days) — ex Rem ACE 
Observed Expected Observed Expected 
2 14 12:2 10 as 
3 34 26-6 10 WM 
4 56 56-0 n 
; u m 36 $1 
H 18) n 104 1158 
à € ed 137 143-0 
: pt poe 165 1622 E 
10 507 493-9 mi une 
11 580 511:5 i 1543 
12 610 5197 e NO 
13 513 523-5 182 186-9 
14 593 525-2 
| 192 187-4 
7 526-0 1876 
15 551 5263 179 | as 
le 560 : o. | 
17 522 526-4 one 1878 
18 565 | m 209 187:8 
517 | 526° 
Cle ation is of much the same order as 
arly, i : are devia i : or 
th Y, in the case of P. a urelia, the mean M excessive: being very nearly six times 
ion of this dis- 


e s 
RE while for P. caudatum jt is mar mide 1 that the major porti 
an s? : = ipis clear from > i ved’ for da; s 4 and 
5 “Pancy is due hia gei ie between ‘expected se a ae een and 
F O a marke! time’ gr «red 
fS rdg s nearly three” - sin Table 1, no simple 
d ay feed cte a pages e seen by st pe numbers of pcne 
Oye, OD such a "afe hes H be expect apart from the entries for days 
4 *t the enti lee jcates- oweveh able, for we have for the 
RA P development of e pnt entries is gm = , 
ma; ; "9 ppea; it for the T gi-0 $ 
a pears that the fit nd the expecte es 


ng 16 days XA?/16 = 0-006558. 


` . . at’ 1 
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:udging by 
i 5 in the case of P. caudatum, and judging : 

gus tipici. aep mna a Crisi between the caleulated and observed n 
hrs ser mee side one might expect from the degree of variation between rep ers 
pna dé ne E Jine further point which should be noted. If we examine the p d 
Sine Ses ‘expected’, in Table 1, there is a suggestion pie bre "i 
negative irs tend to occur more frequently in the earlier part of the en es lie 
positive signs more frequently in the later stages. In other words, these i aa 
tend to overestimate the population in the earlier stages of the growth e pe 
underestimate it in the later stages. The degree of this bias is, however, - e den le 
sidering that the logistic is being applied merely as a first approximation, th 


f such & 
probably as good a fit to the observed data as might be expected from the use o 
simple model. 


5. GROWTH IN MIXED CULTURES 


ider 
3 sias F ; we cons 
When both species of Paramecium are living together in the same microcosm, 

the pair of differential equations 


aN, i 
T = (n—-a, N, b, N;) N, (5:3) 
dN, 
rd =(r,— dN, — b, N,) Ny, 
in which r, and a, are the lo 
similarly r, and [^ 


ure 
of the effect which 


d 
relate 
y simplified experimental conditions in which these two nee which 
we might assume that the magnitude of the effe 


a special case of (5-1) in which w 


e have, 
b= d. b= 04. 
3 dN, aN, 
Then, by subtraction, Ndi Nd Ty — Ta 
and integrating,* MO mee 


N,(0) eri-r3)t (5:2) 
N,(t) NO0) i 


Thus, if our assumptions are correct, two conse 
(1) If we define S, as the species having the greater intrinsic rate of increase (^s 
then as too, the numbers 


" v 
N, of the species $,-+0, and ultimately the species Ss 
persist alone. 


quences should follow. 


> fa)? 


(2) Tf we take the natural logarithms of the ratios N, 


increase of the two species when each was living alone. 


* For footnote see OPposite page, 
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e have estimated that the intrinsic rate of increase (r = log, À) 
e 


In the previous section w 
125; while for P. caudatum, ry = 0°7904; the 


ET eK. when living alone, was 7, = 0:9 
the cem EM Er estimates being 7, —7; = 01221. We therefore define P. aurelia as 
odi india a — P. caudatum as the species Sp, and we should expect that the latter 
ginti by iem : m s to disappear from the mixed cultures. It is apparent from the data 
zm E n (Table 3, p. 144) that this was the case in these experiments. In order to 

he second consequence of our assumption is fulfilled, we must estimate the 


differ ' 

totel ence (7 zi ra) from the data for the mixed cultures. Since the same proportion of the 

ae of each species was removed and discarded at each sample census, We may 
in terms of the mean number of individu 


Then defini als (7i) per 0-5 c.c. of culture given by Gause. 
nin 
$ y= log, 7, (t) — log, W(t), 


it wi 

oe i be found that the relation between the successive y, and t is approximately linear, 
tting a straight line 

y=kt (k= 1-72) 

0)m,(0) = 1 is given), 


passing through the origin (since 7ix( 
le from day 2 to day 25 as 


esti 

imated from the 24 points availab 
k= 0-11382+ 0-0034. 

— 0-1221 which was est 


the regression coefficient k was 


Thi : i 
8 value is remarkably close to the difference 7, —?2 imated from 


the q 
ata for each species when living alone. 
was discussed by 


ntial equations (5:1), when b, = c s 
ir. If we substitute (5:2) in thefirst and 
1), then the resulting equa 
if we put 


* 
Thi á 
Ties special case of the differe: 
Second 2 (1920) in his original memo 
can be e embers, respectively, of (5° 
xpressed as functions of time. Thus, 
N= K= 1/4 etree 
th Nyi=¥ K,= Toldos 
ese i 
© integrals can be written as, 
a(t) = Ki" 4103 e 0L 
Lt Kzoo Oso 
f the system. 
eue 7 Jso have to allow for the 


since we 8 
must express them in the form 


-rit 
, 


y(t) = Ky 


wher 

a , Cí and Cj are constants defining the 

Activitic equations, however, are of little use m 
es of the observer. In order to apply them 


N,(t+)) = fal Nall Nal 
d from y(t) in Y 


to Gause 

(a= 1,2). 

Thi 1-4 1), and usin 5.2). If we 
is can be done by substituting from z(t) in a(¢+ 1), ani (¢+ 1), an g (5:2) 


Write fi 
Or si fA 
simplicity, "TS 


A295 2 
y= (&— 1)/By a, = (Ag— Uf 2 


Weh 
ave finally, after a little rearrangement, 
N,(t+)) = paN +N 


BE o ; 
pra ems x 


NAtt+) = 


Which ; . 
h is the system of equations used later in $6. 
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The observed ratios of the mean number of individuals counted in the oe 
the mixed cultures are given in Table 2, together with those expected from the eq 


Talt) 


; me pro- 
parture from expectation was at all marked, the BAL me "i 
: : : : Tn 
previous section, and the pooled inter-replicate var 


In order to see whether the de 
cedure was adopted as in the 


Table 2. The observed and expected values of the ratio of the mean numbers, 
Paramecium aurelia/P. caudatum, growing in mixed culture 


Ratio Ti (t)/7a(t) 


T 
ape) Observed Expected a Observed Expected 
s — - ji — 
2 1-00 1-25 14 5-78 4-88 
3 1-91 1-41 15 | 6-44 5-46 
4 2-00 1:57 16 6-60 6-12 
5 1-84 1-76 17 | 8-07 6-85 
6 2:30 1-97 18 | 7-46 7-67 
i n 221 19 6-55 8-59 
4 ee 247 20 7-00 9-62 
9 315 2-77 21 8-25 10-77 
"4 2:95 3-10 22 17-50 12-07 
M 4-59 | 3:47 23 | I1T50 13:51 
12 3:64 3-89 24 | 9-43 15-13 
2 6-18 4:36 25 | 17-50 16-94 
| 


Expected = eria, 

P : re 

terms of log; n, was estimated from the data for the mixed cultures. These variances We 

P. aurelia 0-01319, P. caudatum 0-02423, 
while the covariance 

freedom. Clearly, 

when the time tre 

on three replicate: 


of 
Was 0:0007267, each of these estimates being based on 36 ae 
the correlation (r = +0-041) between the numbers of the two Sp o 
nd is eliminated, is negligible. Thus, since each 7, is in both cases bas 


S, we should expect the variance of 


% = log.7(t) — log, 5; (y). 
to be given by 


var (y) = $(2-3026)2 (0-01319 4 902423) — 0.06614. 
From the fitting of the re 


gression line passing through the origin, the dis crepancy betwee? 
observed (y) and expected (Y) was, for the 24 points 


ea X(y— Y) 


*/(n— 1) = 0-06213, 


replicates. 
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6. A FURTHER TEST OF THE HYPOTHESIS 


hether this simplified model of the competition equations 


There i 
here is another way of testing w 
a. In $4 we estimated the parameters À and a in 


ls a sati 
s Adige description of these dat 
€ logistic equation, 


When, in the special case of (5-1), we have 


from 
n th r i 

e data for each species when living alone. 
tions when the two species S; and S, are 


na 
in a E b, = ay, there is an analogous set of equa’ 
ate of competition, viz. 

NiO) : 

N,(t+1 os ee TES 
+D = pea, + N70 | 

ANA) 
N,(t+ 1) = MEE UT x 
pern 1-- ag Nalt) xa NO 


(6:1) 


arameters for the species S; when living alone, and similarly 
h equations we have 


and N,(t)= 0-9N,(t); 


pulation whenever & sample 


whe 
à. ne and q, are the logistic p 
2s those for the species Sẹ In botl 


Ni) = 0-9N,(4) 


enth of the existing po 


ted in §4 for 


in ord 

er 
r to allow for the removal of one-t 
al values estima 


censu 
s was taken. Then, given the numeric 
Ay - 2-4905, 047 0-00026195, 
P. caudatum, 42> 2 ey = 0-00058189, 
N40) = 20. the average numbers of individuals in 
i d therefore the expected mean number 
discarded from day 2 onwards.* 
bers actually 


P. aurelia, 


We 

ca 
t Sans calculate, given that Ni(0) = 
of each ed cultures at successive interv@ 
h species in the samples which w 

pared w 


he 
Se 

ob expected mean numbers can then be compe? pm 

Itures. This js quite a st el and of 

arameters made from 


Sery 
e ; : 
WA ey Gause in the mixed cu cant 
Simplifyin i i „ror in the es i 
g assumption since any error B s 
* data for each s F if niiin uld have & cumulative effect in any such step- 
Y-step ealeulatio eer: 5 € d an numbers would tend to diverge 
1on, an! he es ima: 

Ore a gn 

r 

nd more from those actually 9 


and 


2 until day 25 in the experi- 


pserved. 
able 3 from day ~ 
e expected numbers 


The r : 
ouo results of this caleulation are given in P. aurclia th 
S. It will be seen from Table 3 that in the case of P. cael 
: t but that, certainly after day 19, 


OW el . > sixteen h day: 
osely those observed until about the six e other hand, the expected curve for 


e 

Y then tend to diverge from the latter. 0 io ate EM 
00 

easonably 8 and expe? 


fol 
d data throughout the 


ted numbers is of much the 


Cay 

entire datum appears to give ^ T T 
. E erv T 5 : + 
is Period. The discrepancy between the S jected from the inter-replicate variances. 

: s x 

order of magnitude as that which might bee F 
* „ed that since no sample was taken 
I be remembere 
at, ues Carrying out this step-by-steP calculation: it must 
M ANAL) = (1) 


EST 
> we have N4(1) = N,(1) 2n 
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; rati tween 
Working in logarithms to the base 10 as before, and defining A as the deviation be 
log ‘observed’ and log ‘expected’ numbers in Table 3, we have 


— -H 
| 5* expected from | 
Period 2 : r-replicate 
i LA2/n inter-rep 
Species (days) : variance 
P. aurelia | 2-19 0-00511 0-00440 
2-25 0-00724 
P. caudatum | 2-19 0-00775 0-00808 
| 2-25 0-00995 
| 
ulture 
Table 3. The observed and expected mean number (%) of individuals in 0-5 c.c. of c 
in the mixed populations of Paramecium aurelia and P. caudatum 
P. aurelia P. caudatum 
Time | 
(days) NT OT ee 
Observed Expected Observed Expected 
— e ru E = 
2 10 117 10 9-2 
3 | 21 24-5 ll 17-0 
4 58 47-9 29 29-4 
5 92 84-7 50 46:0 
6 202 131.8 88 63:3 
7 163 179-9 102 76:5 
8 221 221-0 124 83-2 
9 293 253-2 93 84-4 
10 236 278-4 80 82-1 
li 303 299-1 à 
12 302 3171 s3 133 
ig 320 333-5 55 68-2 
T 387 348-8 67 63-1 
i $35 363.2 52 58-1 
ET 363 376-8 ; 
A 323 3897 40 48.8 
p t 4017 48 44-5 
p 20% 418-0 47 40-5 
20 350 423.5 
50 36:8 
21 330 433-9 
22 350 ae 40 m 
23 350 450-5 = oa 
24 330 458-1 Fi ir 
25 350 465-1 » 24 
20 21-9 
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5 s lee pee thas up to day 19, in the case of both species, the discrepancy between 
ae Toi a = is no greater than that which might be expected from the degree of 
ats pen : he three replicates on which the means are based. Even up to day 25 
tibi He overall agreement is reasonable; but, as has been mentioned earlier, it is 
p esie nio. ma the case of P. aurelia, from about day 16, the expected and 
Meee rves are diverging. However, considering the approximate nature of the para- 
ne^ imated from the data for each species when living alone, it is remarkable how 

y we have been able to anticipate, as it were, the results actually observed in the 


experi A sis i 
periments when both species were living together in the same microcosm. 


7. CONCLUSION AND SUMMARY 
lt of this analysis, that the deterministic model 
olterra is a remarkably good description of the 
se stochastic processes. The two main 


Its 
sai ceems reasonable to conclude, as a resu 
o 3 
m ciated with the names of Lotka and V! 
n ; š 
ges which occurred in the mean values of the: 


oi : » i 
points leading to this conclusion are 
r a species living alone—namely the logistic 


B The first approximation in this model fo: 
ation—is a good fit to the data for P. aurelia, as judged by a comparison of the mean 


an deviation between expected and observed with the variance between replicates. 
n E e of P. caudatum this equation was not a good fit; but this was due almost entirely 
^ elatively large discrepancy between expected and observed on 2 days. If these were 

eglected, the fit of the logistic to the remaining observations for this species would be 


rec i 
koned as satisfactory. 


Ted LUE the logistic parameters for th 
Close] Special case of the competition equ ; 
numbe B wa types of Protozoa, it should theoretica. 
ood P in the sample censuses when the two species wi 
those upply. The agreement between the expecte n 

actually observed was very much what might be expec 


repli à 
Plicates in the competition experiments. 


e two species when living alone, and assuming 


ations in this model were applicable to these 
lly be possible to calculate the expected 
ere competing together fora common 
umbers calculated in this way and 
ted from the variation between 
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ON THE DISTRIBUTION OF TRIBOLIUM CONFUSUM 
IN A CONTAINER* 


By D. R. COX axp W. L. SMITH 


l. INTRODUCTION 
Neyman, Park & Scott (1956) 


TR roblems 
have recently discussed a number of statistical pro 
arising from studies of the flou 


T as as 
r beetle, Tribolium. One experiment they describe ie 
with fresh flour oceupying a volume of a cube 10 x 10 ex The 
arge number of Tribolium confusum beetles was placed. 


Table 1. Distribution of Tribolium over a plane cross-section 
(averaged experimental data) 


First quarter Second quarter 
3-2 3-5 4-0 6:4 12.8 
2-5 1-8 2-0 3-7 14-3 
3-5 5-0 42 6-8 11-8 
1-8 1-3 2-2 3:3 12:3 
5 4-0 4-2 5-2 8-0 11:6 
Third ti 
| ard quarter 2.0 2.9 | 2.1 40 | 12-5 
6-4 6-8 8-0 9-2 13-8 
| 3-7 3:3 40 7-2 18-6 
| 
| 12-8 11-8 11-6 13-8 17-9 
| 14-3 12-3 12-5 18-6 30-4 
4 
Obtained from the data quoted by N l described in th 
quoted by Neyman et o (1956) by t} ess descri 
text. The upper figure is the density of females, E uo abis naa J 
density is Symmetrical in the 


i square, Thus, thi 56). 
corner of Table 1 is the average of the four ‘corner’ values given by Neyman et al. (19 

Pe te 
tatistical Laboratory, University of California, Berkeley, partially suppor of 


contract with the Air Research and D; SAF Scho? 
olph Field, ‘Texas, evelopment Command, U; 


by funds provided under 
Aviation Medicine, Rand 
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fre eo ee eni 
mur dicm pond ie ane quare from a relatively low maximum atthe 
A uen ity for the males alls more sharply from a high maximum at the corner 
Budd stantially constant in the central portion. The general feature of a gradual increase 
Pied ity vowerde the edges and along the edges towards the corners is, however, the same 
oth sexes. 

" oe discussing some possible models to explain these phenomena, Neyman ef al. (1956) 
denke ed that, up to the time of their writing, no model producing the gradual increase in 
that ed was available. The object of the present note is to discuss a simple probability model 

oes produce a density distribution which is qualitatively similar to the one observed. 


2. PREVIOUS WORK 


pontes & Kendall (1953) had some success in describing the two-dimensional motion of 
= tchostrongylus retortaeformis by simple Brownian motion. Sherman investigated the con- 

quences of such a model in the present context, and showed that a random walk over 
à square lattice with inelastic boundaries could produce a concentration of beetles at the 
wo darts and in the corners, but could not account for the gradual increase in density 
Fs ards the edges. In a further investigation (Sherman, 1956) of the one-dimensional ran- 

m walk, he showed that distributions more like the observed one could be obtained if 
a Suitable boundary condition was inserted. This was that after striking the boundary, the 
Moving point remains there a randomly distributed time, and is then placed instantaneously 
a finite distance within theregion of motion. Sherman did not, however, claim this boundary 


condition to be a reasonable explanation of the experimental results. 


3. POSSIBLE MODELS 

One approach is to assume that the motion of the beetle is described by a more complicated 
random walk whose steps are not infinitesimal, and may indeed be correlated in direction 
and length; see, for example, the type of walk investigated by Daniels (1952). Suppose that 
Sis combined with a Bouaiars condition of the following type. On meeting the boundary 
ene beetle remains there for a time-interval which has a certain frequency distribution, and 
then pursues a path that is the reflexion in the boundary of the path that would have been 


‘llowed in the absence of the boundary. 
ap be shown by the method of im 
incre the case of unrestricted motion to an asy 

asing dispersion, the limiting distribution wi 


Ora] à à 
pe 4 line concentration on the boundary. This result applies ^ ry widi dise A 
Stion within a circular barrier, and is probably true for a very à 


One object: 3 : dels is that the path of the beetles is highly 
Objection to the Brownian motion mo els disp" ; mes, 
Ure 9 ue ‘setting its direction. This objection does 
Sular, as ere constantly forgetting ae : 
Not apply to i itinere discussed; the failure of the iiec qe vic de a ubp 
ti ° required Mies wed penc be connected with the use ofan E pee d e- E 
^. Indeed, the assumption of perfect reflexion seems à most unrea p 
" 


tr 
76 behaviour of Tribolium. — 
Wi € present note we shall show that by be aa’ pat 
adj a grossly oversimplified description of the be! 
“tribution of beetles of the type observed. 


any such walk, without drift, which 
symptotic normal distribution of indefinitely 


thin the square boundary is uniform except 
lso for such a two-dimensional 


ages that for 


realistic boundary condition, even 
hs within the flour, we can obtain 
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4. GENERAL DESCRIPTION OF THE MODEL 


The basic assumption is that the motion is two-dimensional and mpegs’ ek 
a beetle follows a straight path. When it meets the boundary it may (i) with pro < A 
return along its original path; or, (ii) with probability 1 — P, move a distance 7, n : o 
along the boundary before choosing a new direction of motion, independently of its p 
irection. 
> MR T and the angle 0 that the new direction of motion makes to the normal RE 
boundary at the point of departure are assumed to have distribution functions H ri this 
G(0). We assume for simplicity that the beetle moves with constant speed, althoug 
assumption may be greatly generalized without affecting the final results. NT 
Inspite of the simplicity of the model which we have just described, we have been ur ae 
to obtain an explicit solution to the problem of limiting beetle distribution within a squ 
boundary. We shall, therefore, discuss first the solution for a circular container. 


5. SoLUTION FOR A CIRCULAR CONTAINER 


Y š tric 
Assume that the motion takes place within a circle of unit radius. Let yy, denote a concen 


en 
circle of radius r < 1. Let Q denote a path, defined to be the total track of the beetle betwe 


s s abus S i Tei Š ists 0 
two successive selections of a new direction of motion within the flour, i.e. a path consis 
one chord of the circle, possibly 


traversed several times, followed by an arc of the circum 

ference of the unit circle. 7) 

Consider a large number of such paths developed according to the distributions 3 "i 
G(0) defined above. The probability that a beetle is inside y, is then the total length © hat 
parts of paths Q within y,, divided by the total length of all paths. Thus, it is easy to see s 
the probability I(r), say that a particular beetle is within y, is given by 
Gi mean length of that part of a path which is in y, 

mean length of a path 


ma PAYS q) 


This argument assumes that the spatial distribution of the beetles comes to statistical 
equilibrium and that probabilities and time-averages can be equated, These points ca? s 
justified, and (1) derived directly, by the theory of regenerative stochastic process? 
(Smith, 1955). 

The problem therefore resolves itself into the geometrical question of the value of p(7): T3 
calculate p(r) note from Fig. 1 that the 


T 
chord length in Yr is 2 (7? — sin 0) and that, afte 
making allowance for the probability of traversing the chord several times, 


2 sinir 
en 55 [7 "emos, 1 
1 
Similarly, à= Ttip] 0-si oaao), " 
0 


where 7 is the mean of the distribution H (r). 
To evaluate these integrals, we shall write 


G'(9) = a+b|sind| +esin?9; (4) 
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wl i 
here higher terms could be calculated, if required. Write 


is 
K(r) =| (1—r?sin? 0)? 40, (5) 


is 
Er) af (1 —r2sin? 0): dð, (6) 
0 


for TEE 
r the complete elliptic integrals of the first and second kinds, respectively. Also put 


r tdt 
owo = [gazman d 


Iti , 
is then easily shown from (4) that 
(8) 


pe) = P tog - tune ttu O40 AAO 24D 


A typical path. 


Fig. 1- 
T : p a 
he functions O,(r) satisfy simple recurrence relations (Byrd & Friedman, 1954) and for 


7 0 through 4 they are given by 


Oy) = K(r); 


. 1 2 K(r) —2(1 +77) El). 
O,(r) = Ligy 9) = galas Ee) 
1 
Os) = zs [K(r) --E()]; 
este of tabulated functions. A more useful 
Bl isgivenforr < 1 by 


a function ofr. This 


Fy i 
Om these results we can find T(r) in 
it area as 


ex : 
Pression is for the density of beetles Per ua. 
1 Li (9) 
alr) = sil (r). 
1+7 yK) £01 (20) 


We o 
btai 
am aK(r) + $018 i-f . 
"ES queer 


Or) = 
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: A =c=0, 
The simplest, and physically most plausible distribution in the family (4) has b — c 
a = 2/7, i.e. is the rectangular distribution for 0. For this distribution 


1 
O(r) = = NN (r1) (11) 


mh me 


A short table of K(r) is given in Table 2 and shows that the density per unit area we 
steadily from the centre. In the third and fourth columns of Table 2 are given the MT 
which determine the density in the cases a — c — 0 and a = b = 0, i.e. when the de 
function of 0 is proportional to | sin0 | and to sin26. 


Table 2. Density per unit area (unstandardized) for some special cases 


s K(r) log = K(r) Eg») 
T | 

0-0 1-57 0 0 

0:2 1-59 0-41 0-03 

0-4 1:61 0-85 0-13 

0-6 1-75 1:39 0:33 

0-8 2-00 2-20 0-72 

0-9 2-28 2-94 1-11 

0-95 2-59 3-66 149 

1-00 oo oo co 


oe 3 3 Fini is a line 
In addition to the continuous distribution over the interior of the unit circle, there is ® 
density of probability on the circumference of amount per unit length 


1 7(1—p) (12) 
LW) m—m- 

A (1 22a. b $c T(Y— py 

The general conclusions to be drawn fr 

of 0 is uniform, i.e. if the bee 
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i Instea " , 

obligea i = x ie ape treatment of the case of a square container we have been 

beg Calme capi methods. We assumed that paths may start and end only 

Hans that B is e em : along the sides of the square. We have made the further assump- 

7 — 0. By a tedio 2 M y distributed and that there is no motion along the boundary. 

beatles wa fiers us proognure it is possible to work out the equilibrium distribution x 
(i) Eight bo me A brief outline of the steps in these calculations follows. 

ends of the sid undary points were symmetrically placed at distances 4, $, 15 from the 
(i) ci D ofa square of side 8. These points were then numbered 1, 2, ..., 8 cyclically. 

Dentist n aie t giving the probability that a path starting from a point jon 

ae 8 ri e one of the other sides within a distance 4ofa point b were calculated on 

of a uniform distribution for 0. It was then assumed that any path, which ended 


withi " 

E > seta lof a point k, actually ended at k. 

ids adh aan considerations show that the final distributions can be obtained by 
| abate n paths starting from points 1, 2, 3, 4 only, on one side. Therefore, a Markov 

equilibrin 1 tour states was involved and a system of linear equations determined the 

(tv) ipee Lace id that a path starts from a position with a specified number. 

(v) Th Dr obability attached to each possible path was then calculated using (iii) and (ii). 
Within ET pac of the square was divided into suitable areas and the length of each path 
leng the ch area tabulated. When multiplied by the correct probability from (iv) these 
Was pina the numerator in the expression analogous to (1). The mean length of all paths 
in the v. rmined ina similar way, and hence the equilibrium probabilities of finding a beetle 

arious areas of the square were determined. 


r a square, for a simple model 


| Table 3. Equilibrium density of beetles ove 


Second quarter 


First quarter 
| . 
p $ 0-0176 
0-0131 0-0125 0-0135 
Thi er | 0-0139 0:0144 0:0167 
Third quarter 0-0135 | 0-0144 0-0144 0-0183 
| 0-0176 00107 — | 0-0183 90227 | 
e shown in Table 3. The distribution is only given for 


of the symmetry involved. l 
are the same as those of the experimental 
stant in the centre of the square and rises 
ty rises to maxima in the corners. 
here are puzzling minor irregularities 
ity in the centre of the square is 
three possible explana- 


ts ae Probabilities found in this way a7 
Otice h quarter of the square because © 
results ; that the general features of Table 3 
Steagi] in Table 1. That is, the density is fairly constar 
d towards the sides. Along each edge the dam, 
n the ever, there are two discrepant features. eee ae 
d Slight] computed. values in Table 3. For example: the be Mam 
long of ae than that below or to the side of the centre. 

. lese irregularities. ; . l 
" They may represent real features of the random pra diem 

9 P Sent in the solution of the correct : continuous mo 

Cs emma 
Sche: They may be the result 

mes used, 


sense that they would 
ths are not restricted 


escaped the various checking 


i r ve 
of numerical errors that ha’ 
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(iii) They may be a consequence of the ‘discrete’ approximation adopted. sed le - 
replaced the whole set of paths starting from points in an interval on one E enr 
and going to points in an interval on another side of the square, by one path. ‘a his -i i am 
will, in general, contribute zero probability to regions that, in the continuous s e : Sl 
receive appreciable probability. While this unbalance should tend to even itsel ou 
there are many paths, it is reasonable to expect some irregularities to remain. - 

We are of the opinion that (i) and (ii) are unlikely, and that (iii) is the correct explana ee 

The second discrepancy is that the magnitudes of the trends in Table 3 do not agree a 
well with the experimental values. The corner density in Table 3 is less than twice the are 

in the centre, whereas the corresponding ratios for the experimental densities Bron z 
4 for females and 15 for males. The theoretical trend along the edge from the centre is — 
correct for females and somewhat too small for males. Some of this lack of agreement 2 
be accounted for by the kind of assumption, made in $5, that a beetle which has reached ee 
boundary tends to remain there for some time before choosing a new path within the flour. 


determines the average length of straight sections of path. 
ndary starting angles 0. 


pecial hypothesis that the beetles’ 
and distant from the surface. 


v 
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THE y? GOODNESS-OF-FIT TEST FOR NORMAL DISTRIBUTIONS 


Bv G. S. WATSON 


Australian National University 


if, in making a y? 
continuous distribution, the 


cell frequencies, the statistic 


rapidly as k increases, 


1 and A, which tend to zero 
tabular points of rg 


used in practice so that the 
han 1%, 


l. INTRODUCTION 
Fisher (1924) first established the fundamenta] theorem of x 
the X? statistic is asymptotically distributed as x2, 
(k) of classes less one less the number (s) 
text (1946) gives the most rigorous p; 
lated for a multinomial situation, i. 


ient estimators (which a, 
the result follows, 
Tn fitting a continuou 


8 distribution with density f (v; 6,, 0,, 
(Z; Za), 3 


+++), kelass i 
++» (Zk-1 0) could be used. Then ss intervals (=, Z), 


[2n 


(1) 
x are the observed freq ies i 


SES d 
152; ...,4 0 are 
may be used, Provided Cramé ? ^N are used 
ersal Practice in this Situation 
stimates 0,550, os OF 0,6, .... ample values 


* Followin 


g Cochran (1952), 
the well-kno 


wn distribution (or 
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- the cell boundaries Z,, Zp, ..., Z,,_, are fixed and 6,, Oy, ..., 0, were estimated efficiently 
rom the sample 2, 29, ..., Xy and their result showed that X? was distributed as i 


Xi-s1 Ait -HAYS 


where A,, ..., A, are certain latent roots, all in the interval (0, 1) and 5, ..., y, are standard 
normal variables, independent of each other and of x7. , ,. This shows explicitly why X? 
Is greater than x , ,—the fact observed by the earlier writers. They applied their 
general theorem to the normal distribution and showed that the numerical effects may be 
important, e.g. that the significance level may well be twice the tabular value. 

Since, with Chernoff & Lehmann's formulation, the distribution of X? depends on the 
unknown parameters, it is hard to see the practical significance of their result. If, however, 
the cell probabilities p, ps, ..., pj are prescribed and the class intervals chosen so that, 


always, 
Z au 
n =Í : f; 03, Fe wen) Os (2) 


Zia 


then it will be shown below for the case of the normal, that the distribution of X? does not 
depend on the values of 0,, 0,, but it is still not that of x°; in fact, it is a special case of that 
given by Chernoff & Lehmann. The relevance of this formulation—fixed cell probabilities 
rather than fixed cell boundaries—to practice is not immediately obvious. Before defending 
it, a survey of current practical methods is necessary. 

Even after restricting discussion to the normal distribution, it is not easy to define the 


current practices for the choice of class intervals. The most usual method may be seen in 
except for the two extreme ones, 


the examples of Cramér (1946, §§30-34). The intervals, 

are of equal length. This has the advantage that the determination of the class frequencies 
I$ easy if the end-points are simply related to the scale of measurement, and also makes the 
histogram simple to assess visually. The number of class intervals is restricted above by the 
requirement that the expected class frequencies are not to be ‘too small’ (i.e. to avoid 
deficiencies in the asymptotic theory due to small sample effects) and below by considera- 


tion of loss of sensitivity (i.e. power). While there is no rule for positioning the class boun- 
ean of the sample, and that the length 


daries, it is clear that their position is related to them : 
or the class intervals is conditioned by the variance. Often this means only that the class 
Mtervals are determined by inspection of the data. Since the data varies from sample to 

trictions on the expected class fre- 


Sample, the i 
, class intervals may well do so. The res 
quencies are restrictions on the probability content of the cells, although they do not 


py quire the probability contents of the cells to be fixed. With a large mass of ey gosttüy 
als with ‘reasonable’ probability contents 


ie Simplest method of obtaining class interv AE Beacon 
p, casonable” expected frequencies), is to compute the samp Mure gn 
and 8*, and construct intervals centred on t with lengths of some chosen multiple ot s. 
his requires no judgement and the boundaries can be altered slightly, if necessary, to 
Coincide nearly with the scale of measurement. Tf this is done, we hare exactly, or approzi= 
Mately, the formulation (2). Another case where our formulation is exactly correct will be 
n € when the work of Mann & Wald (1942) is discussed. Thus our formulation is 
Closely related to practical methods. f KA EN 
this, arrive at e e which can be analysed mathematically and i 18 oie to 
Procedure, the following rule is proposed. For k even, let the sample mean be a class 
22-2 
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boundary, and let an equal number of intervals be marked out on either side of it of a i. 
equal to some multiple of the standard deviation of the sample. For k odd, let the samp : 
mean be at the centre of a class interval and let an equal number of intervals be marke 
out on either side of it of length equal to the central interval and to some multiple of the 
sample standard deviation. A strict symmetry of the class intervals about the sample mean 
is used in these rules as it leads to mathematical simplicity. In practice, of course, this 
symmetry is usually only approximate as it conflicts with the desire to have the class 
boundaries simply related to the units of measurement. 

Mann & Wald (1942) have argued that for reasons of power, the class intervals should 
have equal probability content. They have given a formula for the number of intervals k, 


in terms of N and the significance level to be used, again based on power. These rules are 
proved for an arbitrary distribution, without parameters requiring estimation. This restric- 


qual probability intervals and this part of the 
ent problem. In fact, it also has the merit 

rs in X? are equal. Clearly, the formulation 
»2,...,k). It is easily analysed by the present 
about the sample mean. The formula 


, as will be seen later when the eq 
rules are examined. 


As has been remarked by Chernoff & Lehmann and the earlier writers, the effect, on the 
distribution of X? of diffe i i 


2. DISTRIBUTION or X?— GENERAL CASE . 
In the notation of $1, let Pr Pa .-.,p, be the areas desir 
defined in (2) and let Ny, No, . 


ed above the class intervals, as 
5 (Em = N) be the class f bons isti 
ww n s 8 irequencies; then the Statistic 


E (n—Np 
X Iu pl 
B 3s (3) 


$ 
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For this purpose, let indicator variables be defined as follows: 


1 if 2; fallsin (Zi ,,Z) 
Y.) = i 1: ^1 
«0 0 if 2, does not fall in (Z, 4, Z), (4) 


where Z) = — œ, Zą = +00. Then y 

ny = Eno. (8) 
im 
where the joint distribution of ¥,(1), Yl), ..., Yy(2) is unchanged by permutation of the 
subscri 

ubscripts. Thus E(n) = NPY =) (6) 
E(x?) = NPŒ 0) =1)+N(N -1 POS Q) Y4(0) - 0; (7) 
E (Mp) = NN- 1) PO) Yo(m) = 1). (8) 


.,k) are expressed as sums of N variables, it will in general 
t the joint distribution of ny, ..., Ty is multivariate 
(7), (8). X? may be 


Since, by (5), the n, (J = 1, -- 
be true when N tends to infinity tha 
normal with a mean vector and covariance matrix obtainable from (6), 


written as " 
X:2X (m Pe 
Pı 


(m-—E(m)) (E(m) -Np) 
422z-—— — i 7 


X (gm) Np)? (9) 
Np 
ow from the joint distribution of the m,’s that the last two terms 


on the right-hand side of (9) contribute nothing to the limiting (N = œ) distribution of X?, 
while the first term is distributed as @ quadratic form in normal variables. For this it is 


necessary to determine P(Y,()=1) and P(¥,(2) Yo(m) = 1) to terms of order Ow 1); 
This programme will now be carried out for the normal distribution. It is clearer to 


quires estimation, and second 


The method will be to sh 


when both the mean and the variance require estimation. 


3. NORMAL DISTRIBUTION, MEAN UNKNOWN, VARIANCE KNOWN 


Let thi b 
Zis Ze vey py be such tha " zi "T (10) 


Fl-1 
—texp(— the unknown mean is estimated by 

where z, — —co, z, = +0 lt) = (27) kexp (— 39). If z A 
the meat mof the sample under test, it is clear that the Zi of $2 are simply 2;--2. m 
"t is the number of 23,9» «TN in (at aT), which is the same as the number of 


9,—3, my — F in (4-2). The indicators will be defined by 


1 if x;-£ falls in (zi: £ 
¥() = [ 0 if a,—% does not fall in (3-1 %4) 
= (a,—2)/¥(1— 1/N) since then £j lo iv 


d correlations P = ~ 1/( 


(11) 


are jointly normal, 


It is convenient to use f; 'N — 1). Thus the joint density of 


With zero means, unit variances an 
!; and t,, to the required order, is given by 
exp(— 361-5) (1 + pt, tj) (12) 
2m 


340 x? goodness-of-fit test for normal distributions 
To make the subsequent formulae simpler, it is convenient to introduce the notation 
o 
z(1—1/N)-3 


o, N) = I tró(t) dt. (13) 


zi ,(0L—1/N)7à 


For brevity the arguments / and N are only shown when there is any doubt. From (11), 
P(Y,(l) = 1) and P(Y,(l) ¥,(m) = 1) are obtained by integrating (12). Thus 


P(Y,()=1) = $40), (4) 
P(Y,(1) ¥,(m)=1) = oq) Gym) + p, (1) , (m). 

: P 4 
BEL V0 = $e) 25d.) na 
it is easily seen that, to the orders in which they will be required, 

D= p+ à 
J (16) 
D=- Fe, 
®, =p- vy. 
Hence E(m) = Np, + V ()J2, 
var (n) = N(p,— pt — (0), (17) 


Covar (n,n,) = N(— ppp — Woll) yro(m)). 


Putting qj = [V,(1),...,. 0), 1 = [1 
with the p, down the main diagonalin 
as 


»s51]0 xk) and D 


(p,)=D for the kxk matrix 
order, 


the covariance matrix of the n,’s may be written 
A, = ND-DIID— yy}, (18) 
Thus, asymptotically, the second last term of (9) is a normal variable with zero mean and 
variance 1 

y WD -DIYD - d pag, 


and so converges stochastically to zero 


as N — 0, whil 
and also tends to zero. The first term is 


€ the last term is equal to «1 d,/4N 


distributed as 
DULL LE (19) 
where Y1, y», ..., Yy are independent standard normal deviates and the A, are the latent 
roots of 


l r 
WP 1A, = I-11 D-D-y,y 4. (20) 
Since (20) can be written as the sum of the matrices 


Dy [UM -1 " 
I-D- op. -pD Pd 
Dp, M p) PD. ? 
the first of which is idempotent of rank  — 2, the second of which ig a multiple, 1—sD—2ho, 
of an idempotent with a zero product with the first Matrix, the roots A, of ( 20) i o 
1-PD-h, 1(k—2 times). (21) 


= 


OO eT 


y 
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Thus the distribution of X? is here 
Xi-2 + (1 — oD) yi. (22) 


The root 1 — 9; Dd, is shown in Tables 1 and 2 for various values of k and types of class 
intervals. It lies between 0 and 1 and tends to zero as k—- co provided the probability 
content of all the classes tends to zero. A heuristic proof of this limit may easily be given 


Since 
v Va) 


=l (23) 


To study the effect of using the mean m of an independent sample of N, instead of z, to 
estimate ji, consider new variables t; = z;— m. They are jointly normal with zero means, 
variances 1 4- 1/N and correlations 1/(N +1). Here m will be the number of t; which fall in 
(24,2). Introducing suitable indicators to achieve this, the previous argument may be 
repeated and it leads to 

E(n) = Np- - Wal), 
var (m) = N(p,+ pt Vo). (24) 


covar (ny, nm) = N(— pipa + Voll) (m). 


Thus, in this case, X? is distributed as 


Xia (1+ po Do) yi (25) 


which, for large k, is equal to a Xj—2+ 22. . 
The results (22), (25) and the standard result when x is known may be written together, 
when £ is large, as 


yee + 2y} (mean estimated from an independent sample of N), 


X? = 4,2, (mean known), 


X- (mean estimated from sample). (26) 


The results (26) could be interpreted roughly tomean that the correlation (orthe conformity 
of the sample to the hypothesis) from using the sample mean is ‘worth’ two degrees of 
freedom in overcoming the effect of variability which is ‘worth’ one degree of freedom. 


D VARIANCE UNKNOWN 


First it will be supposed that the mean and variance of the normal distribution are estimated 
by % and s®, found from the sample under test. In this case Z,, Zp, -+ +» Z, Will be found from 


4, NoRMAL DISTRIBUTION, MEAN AN 


Z ms a dt 
ade ma] 35 |e JE) 
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so that Z, = +2, s. Thus Zı-ı St; < Z is the same as 


Zi i1—& QUu-íE (Z-z 
E H g” 


2, v,—= 
Le. EIS P 


X2. 


The variates t; = (c; —z)/s have an awkward distribution with a finite range 


C-4N,AN) (14- (N — Aaja, 
By an argument, often used for serial correlations (see, for example, Watson (1955)), 
E{(x,—%)4 (x, m3 
E(t) = - Hem) M ; 
80 that all the joint moments of f, and ty are available, An 


can be found by writing (with Hi(x), the ith Hermite p 
(1946)) 


asymptotic expansion for f(t, ta) 
olynomial as defined by Cramér 


Flirt) - dt)st)($. S vH) Hi) 


and evaluating the coefficients 3 up to and including those of order N-1. Since 


f(t: tə) = f(t, ti), 


The first four moments of f, (and £ 


we have lij = s. 2) are 


2 
0, 49 0 3 EE d 
( xu) 


1 —1 2 2 
hil =>——, = mans a 
while fy Ni’ |] (1+ 2) (1 Wai): 


Hence (to order required) - Opp — 1, 4o, = —1/4N, 


au IN, 05 = —1/2N; 
d; — 0, otherwise, 
as may be shown by further calculations, so that 


fist) = 9) 4(t,) (1 — Fe) Hilt) — Hit) Hy(t,) — Hy) + Hq(t,) (27) 
2N AN 1 

To apply the method of $2, indicators Yi) are defined as ; ll ing (æ; —2)/8 

instead of x, ~%, From (27), it follows that MT tae nt 


1 
PUM=1) = 9,——. (0,-60,+36,), 


PHO Y(m)=1) = 6,0) ©(m) — eee 


~ (90) — 9, (®2(m) — db, (m) 
2N 


— (5,0) — 60, (7) + 3Do(2)) By(m) + D) (b, Gn)... 69m) + 3, (m)) 
4N i 


eg 


X 


w 


gent 
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Introducing these formulae into (6), (7), (8) and using (16) to simplify them, we find that 
E(n) differs from Np, by terms of O(N-) so that the last term in (9) vanishes as N —co, 
and that the covariance matrix of the x, is defined by 
var (nj) = N(p,- pi - Vol) — WA), 
covar (nj nm) = N(—PrPm—Yoll) Vom) — AD v (m)). 
Thus, in matrix form, the covariance matrix may be written as 
A, = N(D—D11'D - popo 35:11) (28) 

Which is the same as (18) except for the additional term in y. 

Thus the limiting distribution of X? is, in this case, of the form (12) where the A, are the 


latent roots of 
$DA, = L- ID D, D-Sub (29) 


The latent roots A, of (29) are the same as those of 
] DH1'Di -D-hp p; D-3 Dh 9; D? 


80 that the , = 1—A, are the roots of 
DHIDi-- D-kjyp;D-3 * Db iD, 
1'Di 
Ed [Di1, D-hp,, 2-3 D-89]| D^ 
2-11 D- 


But, by a well-known matrix result, the non-zero latent roots of AA’, where A is an arbitrary 


Matrix, are the same as those of A’A. Since 


VD1 =1, (IDi)(D-hpj) = tpo = % (1'D4) (D7Ày;) = 1^, = 0, 


the ji are the latent roots of 
1, 0, 
0, bi Do: 2- p DYP: ° 
o, 2-9,D«, iD 
1 — 4a, where 4s andi are the roots of (30). 
is not zero and the quadratie equation 


0 
(30) 


Thus, the roots A, of (29) are 1 (k—3 times), 1 — My 
Or arbitrary systems of class intervals YD}: 


qjD-q,-p POD | Lo 
PD, — PD — 7H 


(Z5 4,00) are symmetrically 
Must tervals ( —0, Z1), (Zv Za) -+ (Aiea 
be solved. If, however, the interva we d P ntervels (—69, 21) (tis) ua») 


Placed with l " 
respect to the sample mear ! us 

p metrically ve with respect to zero inspection shows that vj; Dd is always zero 
? this case, the roots of (D-7) As are therefore 


A, = (1-q,D7«) 2 = (1-34D 


Au)» Ay=1 (b-3times). (31) 
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Since the second term of (9) vanishes stochasticall 


y as N -> co, we have the result that the 
limiting distribution of X? is that of 


XL aet. 62 


m an independent sample, are used. In this case 
tı—T P 
“=a 6-123 
have, as has been shown by Cornish (1954), the joint density 
FIG + 1)] 0 - 924 14,9 imn 
THO -1]sqy - 1 ( Mmm 
= (f 2pt,t, +8)/(1 — pt), To 0(N3) 
eio l e2 : 
ty, te) = ii 2 (33) 
S(t, te) 2n(1—pai +N 2m(1 — pay C7 2+ 20%) 
rs ¥,(1) in the obvious Way, 
P(K(Y) ¥(m) = 1) = PY, 


> 


where p = — 1/(N — 1,Q 


so that, defining new indicato 


() Y(m) —1, when c? is known) 
l òa fm e-Miu) 
tele fe “Ba {= +) HH dnd, 


where, in the integral, N has been put equal to co since this term is already O(N-1), Thus 
P(Y,(1) ¥,(m) = 1) = PYY ¥,(m)=1, when o2 is known) 


* | 940,0)— 0,00, 90 Pen 2040 Dam) + db,(I) Sim) (34) 
Since Fl) = p(t) + yo -2-14 a, 
1 
PRM=1) = 0+ s [- 0, e, EEG) (35) 
Introducing (34) and 


(35) into (6 


); (7), (8) and using the expang; 
found to be Tom 


‘iance 
matrix of the n, is here 8 (16), the covar 


A, — I-11/D.— D>; 
Following the previous method, X? h i 


where 


(36) 
À =1 pD- A3=1 +IP Dg, (37) 
provided PD, = 0, for which symmetry of the intervals Gii %) about zero is a sufficient 
condition. Since À5— 2, as all the intervals tend to Zero, we have then 
Aa 2y3 (variance estimated indep 


endently of the sample), 
Xu X-a (variance known), 


Aa (variance estimated from the sample), (38) 


“ie 
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In each case of (38) it is assumed that the sample mean has been used. Comparing (38) 
with (26), it is seen that the variance behaves in the same way as the mean, in this regard. 


5. NUMERICAL RESULTS 
Only the practical case where the mean and variance have been estimated by the sample 
mean and variance, will now be considered. Restating the results (31), (32), for the simple 
case of symmetrically placed intervals, we have 

X3 yh +U YIH Aet 


O$ yi0, p112 HO. 


where à =l 5 
iai Pr 2113 Pr 


and y^, yr, are defined in (15). 


In Table 1, the values of A, and Àa E(X?) and k—3 are given for various k when class 


intervals of equal probability content are used, in the spirit of Mann & Wald (see § 1). 
In Table 2, the same quantities are given for the rule of $ 1 using intervals of equal length. 


A further column gives ^, the length of the interval (z4 2); for 1 = 2,..., k-1. 
It is now of interest to compare our results with those of Chernoff & Lehmann. As described 


in § 1 they used a fixed system of class intervals ( — 00, Za); (Zr Z2) --+» (Z,,-1,00) and a normal 


Table 1 
T 
k X A; k-3 E(X?) 
" wens 1 e 0-363 
3 0-207 0:773 0 0-980 
4 0-139 0-619 L 1-758 
" 0-103 0-592 2 2-695 
6 0-081 0-459 3 3-540 
1 Onis 0-297 7 7-839 
m 0 0 k-3 k—3 
Table 2 
k h ey ^ "3 AES 
i 1-000 -1 0-363 
[z|m | Be] i: m- 
i 1 Ere 0-459 1 1-577 
n i eM 0-738 1 1:912 
4 04 0-201 0-797 : EG 
5 1-0 0-088 0-272 = Boe) 
5 0-5 0-116 0-684 $ a 
5 04 0-145 0-676 : Een 
6 1-0 0-080 0-184 E 
6 0-5 0-077 Lye s: 4d 
10 1-0 0-077 PS 7 7-128 
10 0-5 0-025 0-103 TEM s 
oo iiec 0 0 ih -3 
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population with mean y and variance o?, Their result is of thesame fem as = - Aa € 
A, are again the roots of a certain matrix. If Z,,Z,,..., Z,., are expressed in 

21525, ..., Zy. 4 by the relations Pese i-a vr 

their results should be comparable with ours when one sets “ = 0,0 = 1 in the matrix 
by Chernoff & Lehmann for the roots Ha (A, = 1-4). In fact, the matrices are then identical. 
This is also seen arithmetically in Table 2; the values of A, and A, for k = 4, h = 0-4 are the 


same as those in Chernoff & Lehmann’s example where the intervals are (—oo, —1),(—1, 0), 
(0, 1), (1, cc) and x = 0, 92 = 2-5. 


This identity of our results with those of Chernoff & Lehmann when 4 — 0, 0? = 1 has 


at, asymptotically, there is no difference between using 


, Chernoff & Lehmann’s general theorem 


ost if their recommended number of classes 
is halved. Our results, especially Tables 1 and i 
the significance points of x? large to avoid under- 


form in normal variables with 
191 Asy2. The method of Pitman & Robbins (1949) 
i e. But a variant* 


à k is not much smaller than 10 the probability 
that X? is greater than u is given accurately by 
e-tu uli a 
l4 74 0 39 
Tak) 2/3 ra - 356 xil tata) de 
where 


% = (2—3(», +n) (35 —1), 

7s = [4+ 35 +08) — (v, +n) +49] ($k—1) (43k— " 
¥ = 2À,/(1 SAh 

Va = 2A,/(1 —A,). | 

vals may be examined by first com 

ction and then using ( 


say) tabular value of vilem Dependin 
this probability from 0-05, th 


| (40) 


puting A, and A, from the 
(40) to find the probability 
8 on what is considered to 
© System of intervals will be judged 


* Suggested to the author in a letter from J. A, Macdonald 
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It is clear that no hard and fast rule will find acceptance, but it does seem worth while 
to suggest a working rule of the very simplest kind. Ten intervals is a convenient round 
number of intervals. Then the probability of X? exceeding 32 (5 %) = 14:067 may be found, 
from (39) and (40), to be 0:057 for the case k = 10 in Table 1 and to be 0-054 and 0-052 for 
the cases k = 10, h = 1,0-5 (respectively) in Table 2. Thus, if ten intervals are formed from 
co, oo), one can be sure that true significance level is 
h nearer to 0-05. Our rule is therefore to use at 
‘small’ expected frequencies. This latter 
1l sample effects invalidating the asymp- 


almost any kind of subdivision of (— 
between 0-05 and 0-06—and usually muc 
least ten intervals, none of which should have 
provision is made to reduce the possibility of sma 


totic results on which the rule rests. 
The methods developed above have a bearing on a slightly different kind of problem, 
las. J. F. Melllwraith of the Storm Water Standards 


brought to my attention by J. B. Doug 
Committee, Sydney, examined the primary rain falls at 142 different stations to see whether 


the hypothesis of log-normal rain falls was tenable. A goodness-of-fit X? was calculated 
from the results at each station, six equally probable intervals being used so that the cell 


boundaries were given by 


co, Z—0-96748, Z- 0:4307s, 0, T+ 0-4307s, X --0-9674s, +00. 


The mean X? was 3:80 and Douglas observed that, because of the work of Chernoff & Leh- 
mann, this need not imply that the hypothesis should be rejected. With the results of the 
present paper, the matter may be taken a step further since these X?have been calculated 
exactly on the assumptions of this paper. In particular, from Table 1, k = 6gives A, = 0-081, 
Az = 0-459, so that on the null hypothesis, these X? are distributed as 


yb 0:081yi 0-459y3. 


Thus E(X?) = 3-540, 
var (X?) = 6-217. 
The mean of 142 values of X? will be approximately normal with mean 3:540 and nidos 


0-04 i 

378. Since 3:80-3:54 = 1-24 
n 40.043718 
the data, assuming homogeneity and independence of stations, do not refute the hypothesis. 


E. J. Hannan for some helpful 

The author is grateful to Prof. P. A. P. Moran and des BS 
diseuaslone to MJ. A. Macdonald for providing D^ unpublished method of deriving the 
j See F. Melllwraith for providing the data in 


distribut; Mr J. 
stribution of X?, to Mr J. B. Douglas and Mr ences to previous discussions of this 


$5, and to the Editor and the Referee for the refer 


Problem, 

: draw the author's attention to a 
anie ] ou unately failed to TON 
Ebert Nate aed in red TIS lr a ot UTES ra (d ad pad 

. E. Barton, ent! ici , 9 
"C0; Mii Up d : " h : pe 
ih dan , fead at the inten ere 45. Taking the porn lage: iei Y° statistic, 
which ere Deco ares Barton ‘considered the use of variable class intervals and reached 
; es X? in a spi , A 
* Conclusion very similar to Watson’s. Before ae is work. E.S.P.] 
© examine the relation of these earlier results to 
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APPROXIMATIONS TO THE DISTRIBUTIONS OF SOME MEASURES 
OF DISPERSION BASED ON SUCCESSIVE DIFFERENCES 


By Y. S. SATHE anv A. R. KAMAT 
F'ergusson College, Poona (India) 


]. INTRODUCTION 


The usual measures of dispersion such as the sample variance, the sample mean deviation, 
the sample range, etc. cannot be used when the mean ofthe population is undergoing a trend, 
Since they are likely to be seriously affected by the trend. Under these circumstances and 
especially if the trend is à slow-moving one the following four measures of dispersion based 


on successive differences have been proposed: 


E d n—1 « 
ó*- nm eric tas (1) 
= 
I n-1 
qe 1 Z etah (2) 
i1 4 
1 n—2 
ó— 7-3 X (u- 22,4 + iaa) (3) 
TEC i 
1 n-2 
d,- —3 y, |ti tir Yos |. (4) 
,— 2 4-1 


. The theory of the distribution of these statistics and their usefulness has been discussed 
in some detail by Von Neumann, Kent, Bellinson & Hart (1941) and Kamat (19534, b, 


1954) among others. These measures of dispersion are considerably less efficient as estimators 
than the sample variance, 5*, when the mean of the parent population remains constant. But 
$? suffers from a heavy bias if the mean is undergoing à trend while statistics based on the 
Successive differences are comparatively unaffected. u : 
It is not easy, however, to obtain the distribution of these statistics. Except fora modified 
form of à? (for whi chsee Kamat, 1955), it has not yet been possible to find the exact distribu- 


tions of these statistics. The procedure so far has been to approximate their distributions by 
ed on their first three or four moments. Apart from the 


&ppropriate Pearson-type curves bas : 
aborious computation involved in the fitting of Pearson curves to these measures of disper- 
sion, a serious drawback to these approximations is that they cannot be used readily for 
Comparing two independent estimates of variability 2 
Periods of time. " 3 ] 
Recently Cadwell (1953, 1954) has successfully used a power of x" to obtain approximate 
Stributions to some measures of dispersion such as the mean deviation, the mean range and 
others. Following this method, one assumes that a statistic uus approzimately distributed 
as (X?/e)e or, taking A = 1/æ, that cu^ is approximately distributed as x° with v degrees of 
freedom, The constants c, æ (or A) and v are then determined by equating the first three 
Moments, j 
Men this paper we have used this method to ia 
" of the four statistics mentioned above. apu 
ample size the same power of X^ ie. a constant value 


pout the trend taken over two different 


obtain approximations to the distributions of 
ults show that over a wide range of 
A, can be used for each of 
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ot ^i e 
these statistics. Consequently, it not only becomes easy to find approximate per Mii 
points by using percentage points of x2, but it is also now possible to compare two differe 
estimates of variability by use of the F test. This is because we may write 


€1 Va [4^ 5 
P= 2%(2), (5) 
CoV, Vt 


i À 1 
are the scale parameters and equivalent degrees of freedom, founc 
Wo u estimators to be compared and A is the common power. 


where c}, Ca, and vj, v, 
from Table 1, of the t 


Table 1. Values of v and logc for measures of dispersion 
based on successive difference, keeping À constant 


63/02 dajo 
À = 0-7770 À = 1-3219 À = 0:7353 A= 1:2195 
zl = k i 

logc v logc v logyc v log;oc 
0-4775 6-053 | 0-6957 3-528 | 0-0135 4-562 | 0-0288 
0-5618 7-434 | 0-7880 — > — — 
0:6326 8:816 | 0-8642 5-398 | 0-1863 7.042 | 0:4822 
0:6933 10-193 | 0-9288 E p = | = 
0:7465 11-579 | 0-9853 = = mes di Es 
0-7941 12-963 | 1-0353 8-24 0-3615 10-815 | 0-6720 
0-8368 14-343 | 1-0800 


i 0:6853 | 23.45 1:0119 
12010 | 33-74 | 14557 | 2250 07868 | 29.78 | 1-1164 
i 27-26 | ©8600: | 3612 | 1.2007 
1:4076 54-50 1-6651 36-77 0-9975 are j88To 
1:5052 68-33 1-7637 i: 


8 Will be useq for 
n-square differences, à? 
on than those based on 
in using the absolute diffe 


ased on the mea; 
of discriminati 


very small samples. 
or 2, will in general 
the mean absolute 


4 
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in calculation, (ii) greater accuracy in the (y;/c)* approximation (see Tables 3, 6, 7 below), 
(iii) greater robustness if the variation is not normal. 


2. EXAMPLES 


Before proceeding to examine these approximations in more detail, we shall illustrate the 
use of the proposed variance-ratio test in comparing variability about the trend in a time 
series. The data used are dust readings taken with a Tyndallometer in a mine shaft. The 
readings were taken at half-minute intervals and the four series A, B, C and D were collected 
on the same morning during different periods, starting at the times shown. We are indebted 
to the Director of the Safety in Mines Research Establishment, Sheffield, for permission to 
use these data. The figures quoted are Tyndallometer readings multiplied by 1000, and the 


Series A (starting at 173 176 130 153 137 144 95 137 131 126 
10.04 hr.) 167 | 118 | 99 | 140 | 111 81 83 | 106 | 104 | 117 
78 88 119 87 124 | 125 | 102 | 109 92 | 114 

as lato [O0 |. [o aS aS 

Series B (starting at 70 | æ | 52 | 49 | 30 | 38 | 45 | 48 | 38 | 57 
10.203 hr.) 40 51 56 | 58 39 25 33 45 45 34 
43 42 20 34 18 30 23 23 19 30 

Series C (starting at 33 43 37 40 43 40 31 36 51 36 
10.47 hr.) 45 43 36 74 79 69 85 95 | 128 | 136 
Series D (starting at 99 59 | 165 | 108 | 167 145 |113 | 124 | 117 | 119 
10.57 hr.) 162 | 187 | 115 | 186 160 | 106 | 142 | 110 | 153 | 165 
137 99 | 106 FLECTERE ee 


m line to line. The four series are plotted in 


Sequences in time read from left to right and fro myser 
A, B and C and possibly in D. For random 


Tig. 1. Tt is clear that a trend is present in series 
Series, it is known that* 

(8) = 202; Eld) = 20 |47 = 11284e; &(53) = 60?; Eldi) = 243 e| 4m = 1-95440. 
The upper section in Table 2 gives the four mean successive differe nces as defined in 
equations (1)-(4) above for each series. Below these are five estima tes of the residual 
Standard deviation, o, i.e. ' 

(Z(z,—3?[(n—1)), 497) dji-1284, 499) and d,/1-9544. 
For series D the random fluctuations are so great that the removal of a linear or parabolic 
trend has scarcely reduced the estimate of variability. For the other thr ee series the removal 
ofa linear trend has an appreciable effect and for Series B and C there is a further, if smaller, 
reduction after allowance is made for a parabolic trend in the estimate s based on 63 and dg. 


&shall now use the variance-ratio approximation to compare the residual variation in (i) A 
“nd D, (ii) B and C. We have from equation (5), 
log F = loge; —log ¢ +108 

and A derived from Tabl 


amat (19535, p. 119) and (1954, p. 7), respectively. 


(wan) + M08 (us] ta); (6) 
w e 1 are given in Table 2 
here the parameters log c; log Ca "1: "e " b 
* 
Values for d and d, have been taken from Ki M si 
Dterpolation in Table 1 was necessary for the onse » 
23 Biom. 44 
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while v, and u, are the corresponding values of 52, d, ó2 


taken from Table 2. 


Comparison of A and D 


or d, for the two series compared, also 


Comment on significance 


Comparison based on 


Just significant at 2-5 % level 
d 1-78 33-74 44-80 Between 5 and 2-5 % levels 
2 Between 5 and 2-5 % levels 
Just not significant at 10 % level 


The analysis suggests th: 
however, two points: (a) 


absolute differences; (b) th 


that the tests based on the mean- 
any rate, give clearer verdicts on significance than those b. 


at the residual variation in Series D is greater than in A. We note, 


ased on the corresponding mean 
using second-difference estimates. 


Series A 


at nothing has been gained by 


140r Series C 


180} 
160} 
140} 
120 
100 

80 

60t- 


Fig. 1. Tyndallometer readin; 


£5 (x 1000) of dust 


in . 
taken at half-minute intervals, ^ mine shaft, 


square differences, in this case at 
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Comparison of B and C 
In this case we have only used ô? and 63; we find the following results: 


F »n Va 
Comparison based on | 1o 21-43 32-47 
) Not significant 


o; 1-13 27-26 | 17-74 


a charts and standard deviation estimates given in Table 2 suggest that itisnecessary to go 
= east to second differences before a legitimate comparison can be made, but clearly neither 
St provides any evidence for a difference in residual variation between the two series. 


Table 2. Data for comparison of Series A, B, C and D 


A B Cc D 
n 33 30 20 25 
Du 739-03 194-28 206-89 1852-25 
g 22-781 10-759 10-895 35.250 
à 2282.35 501-04 418-44 5914-96 
a 43-355 17-750 10-889 61-043 
Estimates of o from E(x; —2)? 26-32 12-85 31-35 33-39 
a. 19:22 9-85 10:17 30-43 
a 20-19 9-53 9-65 31-24 
CH 19:50 9-14 8-35 31:40 
dy 21:67 9-08 8-64 31:23 
Parame 0:7770 0:7770 0-7770 
ters from Table 1 | Ford? A 0:7770 L 
xi Tg 35-78 32-41 21-43 26-94 
logc 1:3235 1-2814 1-1028 1-2010 
Ford À 1-3219 — — 1:3219 
y 44-80 sat = 33-74 
log c 1-5798 — = 1-4557 
2 0-7353 0:7353 0:7353 0:7353 
addita 30-11 27-26 17-74 22-50 
Toko 0-9120 0-8690 0-6853 0-7868 
Ford, À 12195 m = ree 
o 39-92 — = 29-78 
ius 1.2447 =a = 11164 
| 


Lo d]. E ees ege 
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3. BASIC APPROXIMATION 


Cadwell (1953) has shown that if u is approximately distributed as (x?/c)*, then v and « are 
approximately given by the equations 


1- E SIRO HABD, 


RE | ih (7) 
(3V — Jf, 2(3V — fj.) 


aL HR. 
are the values of the coefficient of variation and the /; 
en obtained from the equation 


loge = log 2 + A(log I (æ + iv) -log (1v) — log &(u)}, (9) 
where A = a~t, These relations have bee: 


a= 


where a = VA(3v), and V and f, 
constant of u. The constant c is th 


was found that linear i 


points of X^ is quite adequate for this purpose. 


~ , 
INCE: 62 


= 5, 10, 20, 30, 50. 


Table 3. A 


Pproximation to §%/o% by (x2 Jo)” 
n 


loge Ba difference 


A, difference By difference 

5 ge A Es =. E 
“io 1040 | ones pum +0-1007 —0-0013 +0:0913 
20 214 | 0.7776 meee +0-0380 --0-0001 + 0-0368 
30 325 | 0 7765 13810 Ris --0:0003 +0:0180 
i +0-0123 +0:00 -0113 
50 54-4 0-7781 1-5036 +0-0050 E bust HU rd 

A 

* Following Cadwell (1953), 


for «= 


we obtain 2, and, late: 4 x f 
(3x5) » the rth moment about no * f, values for the ®pproximation by noting that 


kr =T (ra + byT(». 
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A variable value of A does not permit us to construct tests corresponding to F tests. But 
the fuller table of values of À (which is not repřoduced above) indicated that a suitable fixed 
value of A may be used as a compromise value for n > 5, thus allowing us the comparison of 
two independent estimates of variability based on 62. It was found that if we take A = 0-7770 
it yields reasonably good approximation for n> 5. Table 1 gives the values of v and log c for 
the fixed A = 0.7770 for n = 5(1) 20, 25, 30, 40, 50. In this case v is calculated from the 


equati 
sation y = yg 40-0824 — 0:1436 vg + ---» (10) 


Where v, = 3:3127/ V2, V? being the square of the coefficient of variation of 6°. But this 
hile for n < 9 the exact value of v is obtained by inverse 


formula is useful only for nz 9, w. 
by using Brownlee's tables. Log c was thenevaluated 


linear interpolation for the value of V?, 


from the relation (9) above. i 
We should expect that f will not match exactly when À is kept fixed. As mentioned 


above, in this case also we have retained two or three places of decimals in v so asto match £, 
as closely as possible. The discrepancy between the £, value of the approximation and the 
true value of A, is now small, and it was found that it is only in the fourth decimal place for 
>. Again, the discrepancy between the f, values of the approximation with a fixed A and 
the true value of f is of the same order as the discrepancy in these values resulting 
from a variable A. Both these facts are illustrated by columns 6 and 7 of Table 3. 


B. COMPARISON OF APPROXIMATE PERCENTAGE POINTS OF 8? /0* 


tribution of 62/0? can now be readily obtained 
the x? distribution constructed by Hald & 
n in these tables for fractional degrees of 
te significance points correct to two 


The approximate percentage points of the dis 
by the use of the table of percentage points of 
Sinkbaek (1950). Ordinary linear interpolation 1 

eedom is quite adequate to obtain the approxima 
Places of decimals. 


B : ji i: 7 2g? 
Table 4. Comparison of upper 5 % points gwen by various approximations to 8? |o 


— 


n= 20 


Lower Upper 
Approximations Lower 


, | 1% | 1% | 5% | 5% | 1% 


F. i an | 063 | 090 | 345 | 427 
Moore (1) 0-49 0-77 3 m 478 | 0-68 | 093 3-46 4-32 
Moore (2) 0-56 | 080 | 372 | grg | œ66 | 092 | 346 | 432 
(xi/e)*, æ fixed 953 | O79 | 97] | 3g | 067 | 0:92 | 346 | 432 


earson type VI 


ested for the distribution of the mean- 
55) has suggested two approximations: 
btained by matching the first two 
ts. Gayen & Jogdeo (1955) have 


Various approximations have been recently Suge 


8 

i Mare successive difference. For instance, Moore (19 “a 
) Por. cx2/y, and (2) 92/0? ~ ex? [V + eo: The first 15 nen 
ments and the second by matching the first three mo 
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followed the method suggested by Cornish & Fisher (1937) of using the exact distribution of 
the sample variance s* as a first approximation to that of 62, and utilizing the first four 
cumulants of s? and 6? for obtaining approximations of higherorder. We give in Table 4 Gom 
parisons of percentage points of 6?/o* for n = 15 and 20 obtained from the two approxima- 
tions given by Moore (named Moore (1) and Moore (2)), the approximation (2/c)* for Xo 
suggested here and the Pearson type VI approximation based on the first four moments. O 

these the last may be considered the closest to the true distribution of ó?|g? as it is based on 


Table 5. Percentage points for the approximate distribution of à?|a? (A constant) 


Lower Upper 

n - bs d 
0:5 10 2-5 5 5 25 1-0 05 
5 0-07 0-11 0-18 0-27 5-24 6:35 7.84 5:93 
6 011 0-16 0-25 0-35 4-91 5-87 7-13 8:09 
7 0-16 0-21 0-31 0-43 4-66 5-50 6-61 bs 
8 0-20 0-26 0-37 0-49 4-46 5-21 6-20 e 
9 0-24 0-31 0-42 0-55 4-30 499 5-88 6-55 
10 0-28 0-35 0-49 0-60 4-16 5 5/22 

4-80 5-62 
" 0-32 0-39 0-53 0-65 4-04 4-64 5-40 5:97 
12 0-36 0-43 0-56 0-69 3-94 4-50 5-21 ane 
13 0-39 0-46 059 | 0-73 3-86 4-38 5-05 554 
4 0-42 0-50 0-63 ; : ted 
1 6 0-76 3-78 4-28 4-91 5 
15 0-45 0-53 0-66 0-79 3-71 5:22 
q Ay “78 ore 
16 048 | 0-56 0-69 0-82 3-65 nbi Pia 5:08 
17 051 | 059 0-72 0-85 3-60 4-03 4-57 207 
18 0-53 0-61 0-75 0-88 3-54 3-96 4-48 4:56 
19 0-56 0-64 0-77 0-90 3-50 3-90 4-40 AC 
20 0-58 0-66 0-79 0-92 
3-46 3. T don 
25 0-68 0-76 0-89 1-02 3-29 *3 in 432 
T 0-76 0-84 0-97 1:09 3116 345 3-81 4:08 
40 oes 0-96 1-08 1-20 2-99 3.93 3-53 374 
| 1-01 147 1-27 2-88 3-09 3-34 Bun 
L 


It appears from this comparison that the aPproximation Gc)» i ite close to the 
Pearson-type approximation. The approximation, Moore (2 = T Ü à first three 
moments is equally close; however, 7? tests based on it will a sin a viis base 
on (X;/c)" for a fixed æ. The approximation, Moore (1), which w ad be em is structing 
F tests does not appear to be sufficiently close in the tails of ih Corn es niall 

We have not seen in the relevant papers on à? any table of s P ercentage 
points. We are therefore giving in Table 5 the approximate ser cai n - 0-5, 10: 
2-5 and 5-0 % points for 0?|g? foy n = 5(1) 20, 25, 30, 40 and 50 cael horse fixed. 

ty $ 
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6. THE MEAN SUCCESSIVE DIFFERENCE 


In the same manner using the moments given by Kamat (19535, p. 119), v, A and log c were 
obtained for the mean successive difference d/o. A compromise value for a common A was 
then found as A = 1:3219, and for this fixed value of A, v was calculated from the formula 

v = vo + 0-0593 — 0-0610 vj 1 +..., (11) 
where vy = 1:1446/V?, V being the coefficient of variation for d/o. Log c was then obtained 
from the relation (9).* Table 6 gives values of v, A and log c for the representative sample 
Sizes n = 5, 10, 20, 30, 50 and columns 5, 6 and 7 in that table give the difference in £, values 
for a variable A, difference in f, values for the fixed A and the difference in 2, values for the 
fixed A, respectively. Table 1 gives values of v and log c for this fixed A = 1-3219 for 
n = 5(1) 20, 25, 30, 40, 50. We need not repeat the arguments and the conclusions in this case 
Which are similar to those in the case of 02/0? presented in $4 above. A comparison of the 
P^; differences in Tables 3 and 6 suggests that the approximation to the d/o distribution may 


Table 6. Approximation to do by (y;/¢)* 


E 
| A= 1:3219 
" v A | log c fo difference 
ff, difference o difference 
c a JS 1 
i 0-0445 — 0-0066 4-0:0301 
5 6:16 1:3101 0-7047 +0-044 
10 13-02 1-3189 1-0375 +0-0157 —0-0008 +0-0144 
20 26-8 1.3221 1-3549 + 0-0070 — 0-0001 --0-0077 
30 40-5 1-3242 1-5354 — 0-0015 + 0-0002 + 0-0065 
50 67-9 1-3261 1-7608 + 0:0058 + 0-0001 + 0-0028 
es 


be closer than that for 52/02. In his 1953 paper (p. 120) Kamat gave a table of percentage 
Points of the distribution of dja obtained by using à Pearson-curve approximation, having 

le correct first four moments. We have compared these results with those obtained from the 
(rele) approximation nowrecommended (with A = 13219) at» = 3» 10 and 50 at both upper 
and lower 0-5, 1-0, 9.5 and 5:0 % points and have found a most satisfactory agreement. The 
two series offigures were usually identical to the two decimal places given and never differed 

Y More than a unit in the second decimal place. We do not, of course, know the true 
distribution of djo, but general experience suggests that the 4-moment Pearson curve 
Tepresentati onis likely to be good. We think, therefore, that our present approximation in 
terms of a power of x is likely to be entirely adequate for our purpose. 


7. STATISTICS BASED ON SECOND SUCCESSIVE DIFFERENCES: 03/0? AND ds]o 
The same method, taking the moments given by Kamat (1954, m “ p pps se 
"PProximate to the distributions of 63/0? and dą/0. Here also hy trial we found that a fixe 


* Unlike 8?/o? the formula (11) was found to be useful for all nz 5 and linear interpolation in 
e foi 


Br 


p Wnlee’s tables for V? was not necessary- 

ES n the single case n = 3, where the distri 

fit Peta! Kamat (19535, p. 121) was able to d: 
"responded very closely with the true curve. 


ibution of d/o is far from normal, having J, = 1-0356, 
erivethe true distribution and found that the 4-moment 
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pst liom 
A = 0-7353 is most suitable for 63/0? for n>7 and a fixed A = 1-2195 is adequate for da/ 
for n> 5.* 


Tn the case of 82/0? for the fixed A = 0-7353, v was obtained from 
1 12 
p = vy4-0-1296 — 0-240607! + ..., (12) 


em 3 s for 
where v, = 3:6992/ V?, V being the coefficient of variation of 6° (this formula is useful fo 


n> 10; for n < 10, the exact value of v, is obtained by linear interpolation as explained in $4) 
For d,/o with A = 1-2195 kept fixed, v was calculated from 


13 
v = vy4- 0-0324 — 0-0358v; 1 4-..., (18) 


Table 7. f, and p, differences for the (x]c)* approximation to the distributions 
of 2|? and d,/o using a fixed À 


63/02, À = 0:7353 d,/o, À = 1-2195 


f, difference f; difference f, difference fz difference 
5 +0-0749 +0-4517 — 0-020 + 0-088 
7 — 0-0008 +0-1400 — 0-008 4- 0:045 
10 —0-0008 +0-0782 — 0-001 4- 0-031 
15 +0-0005 +0-0482 +0-001 + 0:022 
20 +0-0014 | +0-0354 + 0-002 + 0:016 
25 +0-0009 +0-0273 + 0-002 40:013 
30 +0-0009 + 0-0222 +0-001 -- 0:011 
40 +0-0007 +0-0162 + 0-002 + 0-006 
80 | +0:0006 | 4-0-0129 4- 0-002 4- 0-007 
l 


where v, = 1:3448/ V? and V is the coefficient of variation of d,. In both cases log c was then 
caleulated from the equation (9). 


Values of v and log c for 63/0? and d,/o for sample sizes n = 5, 7, 10, 15, 20, 25, 30, 40, 50 
have been already given in Table 1. Discrepancies in the f, and fl, values for fixed values of 
are given in Table 7. We have not discussed here these a; 


AU jons 
pproximations to the distributio” 
of 63/0? and d/o to the extent we have done above in the case of à? 


|o? and dja, since they are 
in less frequent use than the latter. 


In conclusion we wish to thank Prof. E. S. Pearson, who has drawn the attention of one of 
us to Cadwell's approximation and suggested improvements in the drafting of the paper. 
* For n = 5 the difference in £, was considerabl 


but not so for d,/o. This is borne out by the follo 


2/8, 
Y larger for a fixed A than for a variable A for ail d 
with those given in Table 7. 


i 'et 
wing values of £, differences for variable A comp®” 


| | 
| i | 5 A Ps difference 
63/0? 5 3-418 0-7463 
+0-2939 
dajo 5 | 4-71 | 1-1998 Doe 
[i 
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QUEUEING WITH BALKING 


By FRANK A. HAIGHT 
Auckland University College, New Zealand 


1. INTRODUCTION 


In dealing with problems of queueing, several writers (Kolmogoroff, 1932; Erlang*; 
Kendall, 1951, 1953; Lindley, 1952; Takács, 1955) have discussed the situation whe 
queue stability is obtained by assuming that the demand for service does not overload the 
service mechanism. Thus A, the average number of arrivals per unit time, is assumed to be 
less than ji, the average number of departures per unit time, so that their ratio, p, is less 
than unity. Kawata (1955), on the other hand, has shown that queue stability can also be 
obtained by assuming that, although arrivals occur more frequently than departures, gome 
arrivals choose not to join the queue. In theory this case can be included in the original 

ing À to be computed only from the values provided by those who 


sidered under two genera 


l headings: (a) those relating to the importance of being served, 
and (b) those relating to 


shall, therefore, make the 


obstacle presented by the queue 


simplifying assumption that the individual measures the 
rrives, which will be denoted by k(t). 


by its length when he a 

The factors included 
ranging from absolute urgency, 
absolute indifference, so that no n 


ated, and may produce an opinion 
bitrary length will be joined, to 
ined. It will be assumed that thes? 


These results are (implicitly) s 
Reuter (1957). We derive them b 
for various balking distributions 


pecial cases of a general set, 
y assuming equilibrium, ang 


“Up considered by Kendall & 
investigate them numerically 


* See Brockmeyer et al, (1948). 
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" 2. DIFFERENTIAL EQUATIONS 

u at 

pf mm both arrivals and departures occur as events of homogeneous Poisson processes 
ensity A, x, respectively. Let , 


P(a,t) = Pr (k< {x at time t), F(x) = Pr(K <2), 

Q(a,t) = 1- P(x,t), G(x) = 1— F(x), 

p(x, t) = P(x,t)- P(x- 1, i), fæ) = F(z) -F(x-1). 

time t. Also f(0) = F(0) eae leet ice ke 2 td e erem 
= 3 probability that an individual is absolutely queue- 


Tesistanit. The distribution of K will be called the balking distribution. 
Given a queue of length v, then the probability that an arrival joins it 


= Pr (his balking value >x) 


= G(x— 1). 


Note that in general p( 


The build-up of differential equations, following Feller, is 


(x, t-- At) = [1— (A 1) M] ps f - p 1; é)At 
+Ap(a—1,#) G(e— 2)At + Ap(v, t) F(a — 1)At + O(At)?, 


V "m " ; : 
where the four contributions on the right-hand side are from 


(a) no arrivals or leavers in At, 


(b) one leaver in At, 
arrival in At who joins, 


(c) one 
rrival in At who balks, 


(d) one a 


respectively, 
Take p(x, t) from both sides, divide by At, and take the limit 


(2/t) p(x, t) = -A +a) ple, t) + jupe 1,0 4 Ap(z — 1,0) Ge—2) + Ap) Ple—1), 
Where, if x = 0, we delete x in the first term; F(-1) = 0 = F(-2). Add over x, giving 


(@/at) Plat) = un * 1,1) - Ape. t) G(v — 1). (1) 
imilar to the equation for a birth-and- 


Writing AG(z) = A,, (1) can be reduced to a form $ ; 
eath process given by Feller (1950), but differing both with respect to one subscript and 
mely 


Wi : 
th respect to the initial equation, na: 
; (6j0t) P(e, t) = AP œ- L- (wt Aca 
however, each person must join the queue, 50 that F(x) 
Pecial case of the birth-and-death equation. . 
we can give a method for computing 


sing a method suggested by Koopman (1953) 
Solutions of (1) in en where they exist. Let $(v.5) be the Laplace transformation of 


P(x, t). Transforming, (1) becomes 
— AG( — 2) $(s — 1,5) ( 
w 
a $(— 2,5) ne #(—-1,8) = 0, ô; = 1 fo 
Pty at t = 0. Writing le, 8) = $l s)d(e— 18) 


) P(w,t)+eP(@+ 1,0. 


— 0 for all z, then (1) becomes 


x4- AG(x — 1)+/) G(x, s) - P(e + 1,5) = p 
rz-0 and 0 otherwise, thus assuming the queue 
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> ritten 
dusing the equations corresponding to x = 1,2, ..., it is seen that V(1,s) may be writter 
and usi: 
as a continued fraction A AWG(0) ANA) 
W(1,8) =F 


1 & = By = 


ch 
Substituting this value back into the equation for x = 0, we find $(0, 8) and E e "e 
$(z,s). Values of p(x,t) may then be found by the numerical inversion of the Lay 
transform. 
3. EQUILIBRIUM DISTRIBUTIONS 
s D- 
Tf equilibrium distributions of queue length exist as tco, they will be denoted by sup 
pressing the letter t, and can be found from (1) by setting the left side equal to zero Pe 
P(l)-pp(0) p(x--1) = pG(a— l)p(z) («= 1,2,...). 3 


z—2 
Defining %=4=1, = IL GG) (v = 2,3,...) 
3 
this can be written P(x) = p*c,p(0). (3) 


Summing, 1 = p(0) x po... 


Kawata (1955) and Kendall & Reuter (1957) have shown that the convergence of the 
“c,, is necessary and sufficient for equilibrium to be attained, 
condition p « 1 (which is contained); the condition is that p(0)>0. 


Thus, the tails of the balking distribution are Proportional to the ratio of ordinates of 
the queue length distribution, and when p is known, either may be computed from the 
other. Some examples will be given in $5. 


From (2), we have (if p(x), p(z 4- 1) +0) 


fle) = Ge-1)—G(a) = etu pz 2) 


Series X p replacing the classical 
0 


DU 2) (4) 
PQ)\ p) ^ p(zi1) 

80 we must have P'(x-- 1) 2 p(x) p(x4- 2). (6) 

If p(n) +0 but p(n+1) = 0, then from (2) we have G(n—1) = 0 and 80 p(x) = 0 for all v 7 ?- 

This is the case where there 


is complete balking for queues of length » or more. Any finite 
-length satisfying (5) is attainable in this Way; the condition for 
equilibrium will be satisfied for all p. 

Any infinite distribution 


distribution of queue 


of queue-length (satis 


fying (5)) will correspond to a balking 
distribution (from (4)); there will be a positive probability that K is infinite (i.e. G(co) +9) 
unless 
lim p(x-- 1) = 0, (6) 
z>o P(x) 


Tf (6) is not satisfied, there is 


an upper 
c~ A(G(co))# 


limit to p for equilibrium to be attained. We have 
» 80 Lp%e, converges only if 


MEET TO 
P * Geo) "rh 


) when f(z) is given can 


The reverse problem, of finding p(x be solved, at least numerically: 
some algebraic difficult; 


in every case. However, there are 
analytical solution: (a) expressing pG(a) in a convenient, 
(c) summing the series Epey. 
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4. GENERATING FUNCTIONS 


Let — 3s. = Xstp(z.t) = (1-8) EPC, 1), 
0 
£(s,t) = X stp(z-1,0 F). 
0 


Summing (1) after multiplication by s”, we obtain 
> (62) - - 0,t (E-a) 09- 0,é)} + As£(s, t). 
2 (EA) = apto.) (F-a) fale.) Cut} + Aa 
Letting ¢-+ co with the usual change of notatio: 
nls) — psy s) — (0) - ps?&(s) = 0. (7) 
With s = 1, (0) = p(0) = 1-p * P5) 
£—1 AI) <1. 
p 


n, we have (assuming equilibrium) 


and since 0 « p(0) « 1, 
ymptotic) probability that an individual balks. 
als, whether they joined or not, it can now be 
computed only from those who join 


It will be noted that £(1) is just the (as 
Recalling that p was calculated for all arriv 
Seen that an effective value of the traffic intensity, 
the queue, say p', can be written p- p—pEQ). 


Writing now p,(x), (8), Mp (mean queue length), etc., to denote the dependence on p, 
0), 22, 0); 


We have from (3) - E 
v6) = Ppl 0) X toe, = 2097 e |$ t nn 0 n 
0 1—p8 
and using (7) gle) = n b $ 25]. d 


Thus the queue length and balking distributions are uniquely determined once p,(0) is 


nown as a function of p. 
We have from (3) 


3 a 
P=, log p,(t) = Pap 


d : 
Which essentially determines p (2) (andin particular p, 


of p). Also, the variance of the queue length, , ; 
Ec mp; Pe "x o 


loge, trlogp - log] -r—m, (10) 


when m, is known (as a function 
p 


E (11) 
Diu = Ppl) (x— m sd 

(si : 
"ee Ip (x) = 1). ential equation; from (8) we have 


na(s) and £ p(s) each satisfy a simple partial differ 


= logz,(0) 4 gi(05) 


log 7,(8) 5 
Whence es; - sz) log 9,(8) =? gp 08 1,0 = — My 
2 S), 
Also from (9) log £s) = log (0) Io +9300 ! 


142 =l- Mp 
(o si) ness) = Hs p 
M E notion of? and ‘another function of’ in the 


w c 
a 9( ) and g,( ) are used to denote & fu 
ve equations. 
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Differentiating (7) and setting s = 1 gives 


a 2 
m, = 1*0 - £0) - 260). a 


Differentiating (7) twice and setting s = 1 gives d 
vp = Epp EAER 420) + 46 DEI- 0 — ET) +51) 42,1). (13) 


5. SOME EXAMPLES 


(i) Binomial queue 
Condition (5) is satisfied by the finite distribution 


p(x) = ^C, nel —m"y * (= 
where 0 « z « 1. We find 


n+p 


The corresponding balking distribution is given by * 
n—r-—] n+l 1 
Nino, Xe<n—1, ———————— 

G(x) = 1 n(z--2) f(x)-4 " (&+1)(w+42) 


0 nsz, 0 


O<x<n-1 


nic 
and is independent of z. As poo, z — 1 and the queue is alm 
The relations (8) to (11) ma; 


ost always of length just ”- 
y be verified immediately: 


so. ve te). ue 2 


(i) Negative binomial queue 
eue-length distribution is 

P(t) = NON e(l v) -N-z ( 
where y» 0 and N s 1. We have 


lim 2@+1) _ œ) = 


An example of an infinite qu 


m Oy 119) i); 


z— p(x) 
N 


mI EN A 


DA ; N, 
"Ule Serum "e gs as p N 
The balking distribution is 
N+a+1 N-1 1 
G) = En I MER. 
) Nera fe) N (r1) G8) (% = 0,1,2,...), 
with a probability 1 [N at x = co. The balking distribution is independent, of y: 
Nem ud Np Ny 
seig. E c ces. | 
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(ii) Poisson queue 
Each of the queue distributions in (i) and (ii) approaches the Poisson form as n, N —oo: 


T eP 
ET (= 0,1,2,...). 


p(t) = 
We now have G(co) = 0, so 0<p «oo; m, = p = Vp The balking distribution becomes 


1 1 
Ge) = 53 fe) = aera (a = 0,1,2,...) 


which may be regarded as a discrete analogue of the Cauchy distribution 


7 p,(0) =e, AG) = e PtP, 
(iv) Type III ordinates 


Another possible infinite queue-length distribution is 
ple) = A(s-aye?* (w= 0,1,2.) 


where a, v, A > 0; A is a complicated function of a, v and À. Here 
Ta y ax ly va so 0<p< arp 
aao. nef n n 
The balking distribution is 
a(at+a+2) | (v = 0,1,2,...) 
w AETAT | 2955... 

ae = Fence 


Which is independent of A. For p near (a+1)"/a", we have approximately 


p+l 
al 
mM, = Tati a 
e| (= 


If we let »-> 0 we obtain the classic case where every ar 


- (m, +a)? 
mc EI 


rival must join the queue 


er ispa 


1, pee 
p(z) i-e , Pp 


G(oo) = 


(v) Normal ordinates 
A very simple result is obtained when we assume that the queue-length distribution is 
g-m? (œ= 0,1,2.) 
p(z) = Aexp- 3, (e 
Where m and y are very nearly the mean and variance of the distribution if v and m?/v are 
arge’ 
"Be", say both >9. We find 


G(co) = 0, p= eXP (m-Bje 0<P<% 


T 
B balking distribution is Pascal (geometric) : ; 
G(x) = exp- (£+ DM», fle) = 074) (a = 0, 1,2,.--), (14) 
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PE E E 
where A = exp v7, and is independent of m. Denoting the mean of this distribution by 
= — e have 
GR M+1\)7 (15) 
m, m = $c vlogp, v=o = [log ^ à 
ARN rer, 
The relation m, = const. - v,logp cannot hold exactly for any distribution whateve 
as from (11) it implies first v, — c — const. and then m, « 0 for p « exp — (cv). 


(vi) Deterministic balking 
: ses 
One case in which these calculations are easy is the following: each person Jenn 
: : S em i 
exactly the same degree of queue resistance, i.e. the balking distribution is determini 


F ; A t 
f(x) = 1 for x = K and zero otherwise. Although rather trivial, this case will be importan 
in an application to be mentioned subsequently. We have 


G l Osz«K-1, p O0<x<K+]1, 
q)- = 
e m K <x, ^ (o K+2<a, 
l=p 
1— rp" Usus Kp. 
so P,(0) = Taper p(x) = i1—pF** i 


0 K+2<x. 
From (8) and (9), or direct from the definitions, we have 


1-p = 1-2 1—(ps)E+ 
£,(8) = pe Het, qs) = 1—pke em 


Hence, from (12) and (13), or from (10) and (11), we have 


p ps 

Bec S BC E E, 
ED m m pR+2 

v, = Ty? (K +2) (peep: 


In the trivial situation where each individual insists on immediate service, K 


= 0, and 
p p 
n,=— 3 = — 
^olieg) "7g +p)" 
In the classic situation, in which each individual must join the queue, K = co, and 


tees NN d 
"e^ Tap "T nl 
Each of (i) to (v) above has led to a balking distribution Which is unimodal with mode o 
x = 0. It seems very difficult to obtain explicit results fora balking distribution with mode 
not zero; numerical calculations were carried out for P=2, 3,4,5,6 for each of the 
following balking distributions: e a i 
(vii) Poisson with mean 10. 
(viii) Ordinates of a lo 
variable. 
(ix) Ordinates of a X? distrib 
(x) The Pascal distribution 


gnormal distribution f(y) where 5(logy—1) is a unit normal 


ution with 10 degrees of freedom, 
(14) with A = 10/11, 
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E 1 shows the means and variances of the queue-length distributions obtained, 

Se s i with those of the ‘input’ balking distributions. Also, in the last column, are given 

c DRE values for the Pascal distribution (x) obtained from (15). It will be seen 
; the approximation is very close except when p = 2; here m?/v is only 5:5. 


Table 1. Mean and variance of queue length for various balking distributions 


= i ] 
| Poisson |, Lognormal ba | Pascal | Formulae (15) 

E n | e 
Input | 

Mean 10 10-57 9-498 | 10 10 

Variance 10 19-5384 19-23 | 110 110 
p=2 | 

Mean | 10-320206 10-212608 9-475843 7-823407 7-112564 

Variance 4-908937 | 6-506500 6-110196 9-726074 10-49205 
p=3 | | 

Mean 11:941052 12:578738 11.683753 | 12:026812 12026641 

"Vári&nag 3351001 | 5.382238 4-725208 | 10-467543 10-49205 
p=4 | 

Mean 12-826379 | 14-078101 12-914084 15-041116 15-045024 

Variance 2.944219 | 5-074961 4.248282 | 10490947 | 10-49205 
p=5 | 

Mean 13-430370 15192921 13.830718 | 17:363218 17380325 

Variance 2.582158 4-929776 4-032884 10-463681 10-49205 
p=6 | 

Men | rase | rennes. | MEE D eT DON, 

Variance 2.416736 | 4858583 3.839912 | 10-533106 10-49205 
p=7 | 

Mean 14-248864 | 16.531880 | 15130198 | 20-919959 20-916585 

Varanas SAN | 4-813526 3.28680 | 10-485757 10-49205 

i 


6. SOME GENERALIZATIONS 

5 Interesting application of queueing with balking is furnished by the problem of a 
quence of transporting mechanisms which move discrete units of cargo. In the termino- 
Ogy developed by the Department of Engineering of the University of California at Los 
Y pm (1953), each transporting agency constitutes à ‘link’, and the place where two such 
ite Chanisms transfer their loads is a ‘node’. If there is room at a node for the storage of S 
P the number in storage at time may be regarded Cr ae or lengi * SONA n 
e link setting down items carries A, units at 2 time, and the link picking up items carries 
E vent the process stopping alto- 


28684 à 
a time, then we must assume S sM gá h to pre 


Sether ; , ich we absorb the time required 
- The tim ` ; f the links (in which we quire: 
or pi e time required for the ee o arrival and inter-departure times of the queue 


pon is such that the arriving link 
e nod ;(t) at the node is suc’ i ing 
anog Set iE them. v iet eom hod goes back and arrives again after 
its cargo, Le. — , om i 

other inter-arrival ae Tf the storage when EC ane ed 
a full load is not available, it also goes back and forth un cur gu cy 
or (25, consideri ii nks separated by one node, the following generalizations 

ering only two links sep Jap introduction are suggested: (a) bulk 


e ; x 
P T as à ; z ur 
Tobabilistic queueing model discussed in © m 
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H . Ss d de- 
ivals and departures, (b) balking distributions associated with both arrivals a a 
iem (c) à finite number of states possible. Also, the cargo-handling case req 


: ialize for 
deterministic balking; we shall begin with stochastic balking, and then specialize 
cargo-handling. 


3 E METS to the 
Let K, and K, be two integers such that when A, items arrive, they are added 


i rives, it takes 
queue when k(t) < K, and not otherwise, and when the removal mechanism arrives, it 
Y " 
away A, items if k(t) > K, and not otherwise. Let F(x) = Pr(K, <a) = 1— G(x). 
Using the same argument that was employed in § 2, 


p(u,t+At) = (1—AAE — At) p(x, t) 


FAA(L— At) (p(x, t) F. (x — I) p(x— A, Gí(v— A,— 1)] 

FUAI- AL) p(as t) Ga) + pas 4- Ag, t) Fc + 45)] +0(At), 
where departures as well as arri 
Special cases not only for x = 
with by conventions regarding 


ave 

vals are now partitioned into two cases. Also, we be 
0, but for x less than A, or A,. However, this can be 
negative arguments. We put 

P(e)=0 for x<0 or 

F(r)—-0 for g< 0, 

F(z) =0 for v«0. 
Passing to the limit, and letting t+ 00 ( 


r8, 


assuming equilibrium) we obtain 
0 7 6,2 4, 1) pla — Ay) + (pG,(— 1) Fale) D(x) — Fo 4- Ay) p(w + Ap) 


(x = 0,1,..-,8)- 


p= {p(0), p(1), +++) p(S)}, a= {0, 0, e.g 


1} 
and B is an (S+ 1) x (S+ 1) matrix having zeros 


e 
everywhere except in the last row Wi h 
all the elements are unity), the principal diagonal, the Asth Super-diagonal, and the Ay 
subdiagonal. f 

If A, = A, = 1, these equations can be solved exactly as in $3; only a redefinition 4 
€, is needed 


z—2 E 
Cz = I eon Gaj). 


In case of arbitrary A, and A,, but with deterministic balkin 


K,-S—A, 
Sgested by the cargo-h; 


g defined by 
K, = A, 

(which are the values su 
a8 follows: 
(a) the elements in the A 
(5) the elements in the A 


(c) the principal diagona 


; : m plifie? 
andling Problem), the matrix B simp}! 


1th subdiagona] become — 
oth super-diagon, 


al become — 1; 
l consists of —A 


2 elements of value P, followed by 
8—A,—4,11»0 
elements of value 1 tp and, finally, A, elements of value 1. 
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I have evaluated this determinant only in special cases, but offer the following con- 


jectures about the polynomial B(p; A4, 45, S): 
(i) it vanishes unless 4, and A, are relatively prime; 
(ii) if A, and A, are relatively prime, it consists of S — 4, — 454-3 terms, beginning with 


a term of degree S — A, + 1 and ending with a term of degree 4,— 1; 


(ili) B(p; m,n, S) = pSB(p7!; n, m, S); 
(iv) the first and last coefficients are A, and Ag, respectively; 


(v) Bip; 1,1,8) = X pf. 


j-0 
I wish to thank Drs F. N. David, D. G. Kendall, C. L. Mallows and R. Bellman for their 


helpful suggestions. 
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TESTING FOR SERIAL CORRELATION IN SYSTEMS OF 
SIMULTANEOUS REGRESSION EQUATIONS 


Bv J. DURBIN 


Research Techniques U: nit, London School of Economics and Political Science 


1. INTRODUCTION 


: ; nd. 
Much attention has been given in recent years to the problem of fitting economic models ar 
this has led to the study of systems of simultaneous regression equations of the form 


Ay = Bz-4e, (1) 


ector of predetermined or inde- 
; "S. 
matrices of unknown pan 
A h 

tthe e’s are random variables wi 


s. Given the model (1) we usually require to 


bservations of y and a and to make 
confidence statements about the estimates. 


In some formulations certain of the z's coincide 


The specification (1 
calls for simil. 
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estimation procedure. With this reservation, when the errors are normally distributed and 
satisfy reasonable assumptions, the estimators so obtained are maximum likelihood 
estimators. 

A method for the simultaneous estimation of all the parameters of the system which 
makes use of all the information available about the restrictions on the elements of A and 
B has been developed by Koopmans and others (1950), but it requires extremely heavy 
computations and has other disadvantages which make it unlikely that the method will be 
found useful in practice. Consequently, it will be assumed in this paper that the equations 
under study have been fitted by reduced-form or limited-information methods. 

These methods are extensions of the method of least squares to the simultaneous equations 
case, and like least squares depend for their validity on the assumption that the errors are 
serially uncorrelated. If this assumption does not hold, the estimators are not efficient and 
confidence regions calculated without taking the serial correlation into account may be 
highly misleading. Tests of serial correlation should therefore form part of any analysis in 
Which the observations consist of time series. A small-sample test for single-equation 
models has been given by Durbin & Watson (1950, 1951). The purpose of the present paper 
is to extend this test to cover simultaneous-equation models in which the parameters have 
been fitted by the methods referred to. T 

Ibis; of avare; possible to transform (1) by multiplying through by A~ to give the system 

y- AoOBx+Ate, (2) 
ngle dependent variable. If the original errors e are 
serially uncorrelated, so are the transfor med errors A~e and each equation may be fitted 
Separately by least squares. Consequently, the test appropriate to single-equation models 
may be applie d to each equation separately and in this way it appears that the hypothesis 
that all the e's are serially uncorrelated may pe investigated. In view of this, it may be asked 
why any special consideration need be given to simultaneous-equation model C 


face of it ith by existing methods. 
th l be dealt with by 105 
In the rd eet no method available for combining the results from the separate 


tests into an overall test for all the errorsin the model. Secondly, even if an overall test could 
€ constructed it would not be suitable for casesin which our attention: is focused ona specific 
equation of the model. What is required js a test which has high power against alternatives 


i ill be seen below, for a sample 
affecting esti in that equation. As wi 3 
g estimators of the parameters 1 4 E i 
ofn observations these estimators depen do f n variates which are functions of 


n a vector o: pou 
ko errora th gation of), Tho een on oe, 
his vector into a component lying in a space ? ay 


fr dimensions, 

d : d a component in a space of 

T bein ; t variables in the model, anc à ¢ : 
“rors, the &fücienoy of the estimates and the validity of con 


dence statements about them 
“pends on the property that these components are uoi MT HM. ron 
Spaces, The effect of departure from serial independence 1$ Ps as tou mic 
Ons non-random. Now the estimation component Len compon! 
. LA 1 
at are being estimated whereas the direction of the re 


ent is independent of 
t of serial dependence on the 
unkno : ble to base the tes 
y wn ^ . Thus, it seems reasona Aem a b 
distribution ee " heresidual corppon ent. Sm d used in the tests described 
s nd is k 
Clow measures the direction of this component an 


In which each equation contains only a st 


nown to have high power against 
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i i bin 

Markoff alternatives. Details of its power properties may be found in the paper by Durbi 

Watson (1950). x 
b Paru to a procedure attributed to H. Rubin for testing serial correlation in d 
taneous-equation models is made on pp. 70 and 73 of the book by Klein (1950). Detai vem 
not given, but the method is based on the lag correlations of the complete set of resi is 
from all equations of the system, and so is quite different from the test described in 


: i id 
paper. Presumably Rubin's is a large-sample test, whereas the test described below is vali 
for small samples. 


2. JUST-IDENTIFIED EQUATIONS 
Suppose that the model (1) has p+1 equations and that we wish to estimate the p 
in the first of these equations, which we assume is just-identified. Suppose also that 
vector x has k+ p elements (k> 0). It is convenient to write (1) in the transposed form 
y'A' =B +6". (3) 
The model for » observations is then 


YA’ = XB' 4p, (4) 


matrix of errors. Th 


s ee 
ij] Which ensure identificatio 
Prva = + = Bega = on = Pran. = 0. 


Representing column vectors by small letters and 
be written in the form 


are a, = l and 


matrices by capital letters (4) can then 


Let X, denote the part of X£ 


2 orthogonal to X se. X, = x* ! X yx: xg. (5) cat 
then be rewritten as i 2— XXL Xy): Xi 


on Xp, i.e. by setting X^(y, + Y,a) = 
is given by the regression of yi 
the fitted equation ( 


a 
To the regression of yı + Ys 


nx 291. The estimator © 


Xi(y, + Y,a). The set of n residuals fro? 
VtYa-Xx 10: 


Op, Which gives q = _ (X.Y, 
Y,a on X3 L8, bs 
6) is then given byz- 
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As in the single- ion cas W o! 951) the test f 

¢ e equatio case (Du bin & est for serial correlation 
g rom atson, 1950, 1 i 

performed by calculating the statistic ? i 


i ) 
x (2; —2i3Y* 


qa 
n 
Ez 
del (8) 
| zDz 
ux 
Where D is the matrix T4 Qu 9 
=J 2-1 : 
D % . 0 
z —1 2 —1 
0 iss 0-1 1 


We 
shall show that in the usual case in which one of the columns of X, is the vector of 1’ 
S 


l 


; the significance of d may be tested by entering Durbin & Watson’s (1951) tables with 
i 


1 
: " k+p-1. 
ost-multiplying (4) by (4’)* we have 
otb Y= XC+F, 
£, in the form corresponding to (7), 
[y, į Y] =a: X,]1C 4 (f i fi. (9) 


Substitutine ; 
tituting in z=, 25a 


z= X, +X +f 


we have 
whi 
ere c, and c, are given vectors depending on d. b and f = fi+ Fa. 
L a 
E p^ K be an n —k—p xn matrix such that H — l ;| is an orthogonal matrix, and let 
2. Then K 
xi 
Ec [X5 t+ X361 
K 


ro means and constant variance 


wl ; 
lere w is the n—k—p vector Kf. 
he rows of # are JN ZC vectors, 


Sei of independent normal vari 
80 are ie matrix) be called LN ZO variat 
Gona rows of F, and since H is orthogo 
er the distribution of u conditional 


s) with ze 
). Since t 
he rows of H. F. 

k+p rows of HF being held fixed. 


h reduces to c; + X(f + Fa) = 0,. 


ariates (vector 
es (vectors 
nal so are t 

on the first 


E ONE. ips i 
A equation for @ is X5(J Y,a) = Op» bier : 
he value of a depends only on the first +p rows of HT, and istre iore aoe p 
fixed quantity. Now u depends 


tionally as if a were a 


Xed, 
Consequently u behaves condi 
HF. Since t! HF are INZC vectors of whose 


Y on aand the last n— k — p rows of he rows of 


ion in si "essi tions 
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" cs 1 3 -k-p 
l ts u is a linear combination, it follows that w is conditionally a vector of n 
elemen 
INZC variates. u 
Since £ — Hz, we havez — H'£. Substituting in (8) we have 
a= ZHDH't 
7 
_ win 
P7" 


where Lis an n -k—pxn—k—p matrix. 
Let N be the orthogonal matrix diagonalizing L, i.e., 


Vy 


N'LN = Ye 


Vr—k—p 


U, i.e. u = Ny. Then 


the blank spaces Tepresenting zeros. Let 7 = N’ 
a " 
Vz 
Bah c (10) 
d= aes H 
PET 
ps aal 
Where 7, ..., 7,4» are conditionally INZC variates, Denote their common (conditional) 


variance by c? and let & — qo (i — 1, ++2—k—p). Then e 


eoe a are independent 
N(0, 1) variates, Substituting in (10) 


we have 
n—-k-—p 
2 
2 ng 
d= fi 


t (11) 


Now the INZG property of £,, "tale m depended on the rest; 
first p rows of HF, whereas pisse 


restrictions. We may therefore rela 
will hold unconditionally, 

d is now in the form cons 
especially pp. 41 1-13, and 1 


rietions imposed on M 
) independently of es 
bution of d defined by ( 


n-k-5 are independent NV. (0,1 
X the restrictions and the distri 


idered for the single-equation case by Durbin & Watson (195 
951). It is shown 


in these papers that d L,€d € dy, where 


n—k-—p n-k-p5 
2 AG 2i Asus 
Lp (and dy = E EE , 
x & xc 
i=1 dep A 
A <Ag<... <A, 


Significance e 
k+p-1 = k'. It may be noted that in each cae 
88 than the tabulated value o 


Íd, is 
n the tabulated d 


v is declared non 


method requires the calculation of t] 


he 
to the present case. T! 
he first two mome 


8 of the matrix X — A © | 
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deviations from the means of the k+p—1 independent variables (other than that corre- 
sponding to the mean) in the model. For tests against negative serial correlation and two- 
sided tests the procedures given by Durbin & Watson may be followed. 

Tee results of this section may be summarized in the following rule: 
When a just-identified equation has been fitted by the reduced-form method a test of 


Serial correlation may be performed by calculating 


n 
X (uua 
i-2 DA 


d= 


, 


n 
X 
i-i 


m Zis ..., 2, is the set of observed residuals, and entering the tables d; and dy (Durbin & 
Watson, 1951) with k’ equal to the number of constants fitted in addition to the mean. 


3. OVER-IDENTIFIED EQUATIONS 


When the number of independent variables with zero coefficients in the first equation is 


Sreater than p the equation is over-identified. The model for n observations is, as before, 


YA' = XB' +E, 
ee Y, E are nx p-- 1 matrices but X is now "x k+q with q>p. The restrictions on 
Which give rise to over-identification are Boana = Pisa 7 = Baga =i, 
Equation (4) can again be written in the form 


(12) 


[yi i Ys] | F 
Where X = [X; : Xž]and X, = Xš- X,(X1X)7 X{Xf. If the columns of X, and X, are not 
already normalized and orthogonal we may transform to new independent variables with 
fall Property without affecting the values of the residuals on which the serial correlation 
test is to be based. 

The limited-information estimator of a is the vector @ such that in the Euclidean repre- 
“entation of the [n — k]-dimensional space orthogonal to the [k] space spanned by the 
columns of x. the angle between the projection on the [n — k] space and the vector y, + Ya 
^. : is a maximum. It may be noted that in 


a : 
Nd the [g] space spanned by the columns of X, is à! wn 
<e just-identified case, for which q = p; this angle is aright ang'e; when g^ p, the angle will 


i Pe ia B : . oue $ 
N genera] be less than a right angle. The limited-information estimator b of f is given by the 


l'égresgio à "DET, Pe 

Sion of y, +y,4 on X}, i.e. b = Xii 34)- il. 

let f= a ha —X; b) where K is such that H — 2 is an orthogonal matrix. 

Tila = a where v = X1(y, 4 Y,a) is the projection of the residual vector z on the X, 
, = PAS BG 2 

“Pace an $ i ysof F = [fy F] = E(4)? are INZO vectors. 

du = K(f, 4- F,a) where, asin $2, therov Be aie Mies ah are: 


e cho enimi ^olu'au). AS a function 0 A 2 
imu c am E he elements of the p--1x p 1 matrix P'K'KF. 


Ine 
r n of X7 P and u'u depends only on t Pand PK'KF 
» @ depends only on the elements of X; F an z ) 
Ow the rows eens INZC vectors independent mo graer ges Ai cape dcr ig 
t density depends only on the elements of the matrix of EUIS pd 
pends only on t KF is held constant are equally 


join 
ely PE . Consequently, all samples for which PK 
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ikely. Now the elements of F' K' KF fix the lengths and relative inclinations of the ru 
E tit ting the columns of KF. Regarding these vectors as a set of axes, if F'K'KF is sn 
Poner eins of the axes is fixed, but all orientations of the axes in the n —k- A pes a 
equally likely. Consequently, any vector whose orientation is fixed relative to the a3 
irected in the n — k —q space. : 
idum the distribution of u conditional on the X; F and FK 'KF held arms 
ais a fixed vector, so that u = K(f, + F,a) is conditionally a fixed linear combination 


; ; i joint 
columns of KF. It follows that u is randomly directed in the n— : — gq space, i.e. the jo 
distribution of the elements of u is spherically symmetric. 


; ; X» 
Let b, denote the vector of least-squares coefficients of the regression of y, + Ypa ps 
i.e. ba = X3(y,+Y,a), andletz* = y, - Y,a— X40 — X5b,. Then z* denotes the set of resi 


; abus "E b 
from the regression of y, + Y,a on X, and X. 2- The test for serial correlation is perfor med by 
calculating the statistic 


n 
EC =z) 
= 
WENN FA 
Xu 
ici 


Let £* = Hz*. Then £*- er As in (10) 


where 7 = N’u, N' being an orthogonal matrix. R 
Consider the conditional distribution of d for fixed X 2F and F'K'KF, Since N' is a 
gonaland points ware equi-probable on the sphere w'u = constant, points 7 are equi-proba 
= 
on the sphere 7^7 = constant. Put qf = ny 9). Thend = wv 


s * are 
= NM v2, where points y" ® 
i-1 
equi-probable on the unit sphere. Sincesf, 


+ 1 4 , 
++» Ting ave independent of X LF and F'K M 
the unconditional distribution of d when X aF and F'K'KF are allowed to vary is identic 
with the conditional distribution when the 


se quantities are held fixed. d 
Let w be a X variate with n—k— q degrees of freedom independent of qt, ess ee e 
let 6, = why (i = 1, s b q). Then £, ..., C i, are independent N(0, 1) variates a 


n—k—q 
à vg 
"ECL 
^ n—k-q 
x 
i=1 
This is in the same form as the expression (11) obtained for the just-identified case a 
follows that an observed d may be : 


k' = k+q—1. When the test is in 
asin the single-equation case, bas 


e^ the 
; — l matrix X — X of deviations from 
sample means of the k -+ q—1 independent variable 


bin & Watson’s (1951) tables wi à 
H ge 
conclusive the same 


jor" 
ne test that the columns Ji Mg TOE ig 


J. DuRBIN ooh 


test additio 
nal to those necessary i 
E ry for caleulating the limited-i i i 
ond s the treatment given above ced ee LE 
sual computing proced imited-infon 
a 3 g procedure for the limited-inf i i 
P- 169) requires the calculation of the matrix ^Y aca aad P. 
Where Y, X W-Yrr-Yrx(xpuxr, 
aret kiss matrices of deviations from means of observations of dependi 
Bout he equation and independent variables in the model (whether in the m m 
fn ds patale Normally the matrix product Y'X(X'XyAX"Y is calculated cm 
icin ge Doolittle procedure as describe enpe 
By e. it will be more convenient to calculate 
ee iplication by Y'X. Apart from this the vec 
culated in the way described by Klein. 
Let a* d 1 

enote the p+ 1x 1 vect i i 
rece "^ p vector ||. Then the set of residuals z* on which the test is 

e set of residuals from the least-squares regression of Ya* 

zs -U-XUXXx)ox Ya*. 


d by EB When a serial-correlation test is 
(X' X)1X'Y first and obtain the product 
tors a, b of limited-information estimates 


on X, i.e. 


Tt is W 
ne . . n 
orth noting that the sum of squares yz*? is equal to a War. 
i= 
n in the following rule: When the equation 


We 
may summarize the results of this sectio: 
ted-information method, a test of serial 


V-Y.-x 
Cen X,f+e has been fitted by the limi 
ion may be performed by calculating 


n 
i P -zf 
j 

PES 

i=l 
he multiple regression of y, +¥,a (a being the 
dependent variables of the system, whether or 
zero coefficients, and referring to Durbin & 


Wher 

limited iio? df is the set of residuals from * 
Not they ormation estimate of a) on all the in 
at Sate Fiat in the fitted equation with non- 
n P 25 H Table 1, with &' = k+q— i, 
tom the ice itis unlikely that there will be much difference between the value of d calculated 
"a I z and that calculated from the residuals z*. In fact the author conjectures 
Unable to mes test applies to the former value as well as to the latter, although he has been 
z* it oo this. Thus, ifitis wished to avoid the extra labour of calculating the residuals 
? and dei hat a good approximate test could be obtained by calculating d from theresiduals 
rring to Durbin & Watson's table with k' = k*q— 1 
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HETEROGENEOUS ERROR VARIANCES IN 
SPLIT-PLOT EXPERIMENTS 


By R. N. CURNOW 
Agricultural Research Council Unit of Statistics, University of Aberdeen 


SUMMARY. In a split-plot experiment, the common assumption is that the same pae te 
applies to all subplot treatments, This paper is concerned with tests of significance for the ¢ T patie 
from equality of the variances for different subplot treatments, and with the estimation of s pel 
of a pair of such variances. The methods used may be regarded as extensions of those descri E € 
Morgan (1939) and Pitman (1939) in connexion with the problem of comparing the variances o ato 
correlated and normally distributed measurements. In particular, Pitman’s method is extende 
provide confidence limits for the variance ratio. 

Suppose that visual acuities wi 
men, and that the possibility of 
be investigated. If the distribut 
their sum and their difference, 
distribution depends upon the 
they proposed to test the hypo 
correlation coefficient. In an 
may depend upon the treatm 
treatment increases the yields by a considerable amount, Th 
of two variances outlined above would allow for th 
plot, but is no longer applicable, 
methods of this paper enable tests of significance and 

An example of a simple split-plot experiment is given, with a discussion of the standard errors for 
the various treatment comparisons, 


x f 
ere measured, for right eye and left eye separately, on a sample o 
a difference between o2 


o 
and cj, the variances for the two eyes, Men i f 
ion of the original measurements is bivariate normal, then so is tha. 
Morgan and Pitman no 


ted that the correlation coefficient in the latter 
difference (02— 62), bein, 


thesis of equal variance 
agrieultural experiment 
ent it receives. In parti 


1. INTRODUCTION 
The simplest type of split-plot experiment consists of 
whole-plot split into b subplots. Denote the a whol 


: h 
€ blocks of a whole-plots, with eae 
the b subplot treatments by B,,..., B. The usual modi 


€-plot treatments by Ay, ... 4a jd 
el on which the analysis is based ta 


(i= 1,...,a); 
uk lU — 1,...,5) 
(k = R 


Vig = HO, 4 iyu ee 


, 


with TTL IM 
A * X^ 2 A; = 0; 


» &; the effect of trea 
B; and y,; the interaction of treat; 


The usual assumptions made a 


ent 
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ae correlations between the ¢’s for the several subplots of each main plot, and in which the 
€'s for each subplot treatment may have a different variance. Thus we take 
E(cig) = 95. 
E(ciseigs) = Pir iTr 
The more usual assumptions mentioned above would ha 
oj = Out 
and pj; TT = Fo for all j, j. 
particular case b = 2. Except when the 


d that each main plot has only two sub- 
treatments is of interest, an analysis 


ve 


This paper is concerned, in the main, with the 
contrary is explicitly stated, it is hereafter assume! 
Plots; of course, if b > 2 and a particular pair of subplot 
as for b = 2 can be performed for these subplots alone. 


D 2, A TEST OF THE HYPOTHESIS 0j = oi 
e 
fine S and D by S = Var t Yizk 

D = Yar — Yiz 


Further, let Egg, Esp and Epp denote the error sums of squares and sums of products in 

he analysis of oovariaus of S and D shown in Table 1. The breakdown of the sums of 
Squares for S and for D is the usual one for whole-plot totals and differences except that the 
^(c— 1) degrees of freedom for the subplot error have been split into (c — 1) for the interaction 
of treatments B and blocks and a further (a— 1) (c- D. 


Table 1. Analysis of covariance of Sand D 
re. 
Sums of squares 
Components for and products at. Components for 
analysis of S d.f. E analysis of D 
S: SD Dp 
CORN d uum PM 
Adjustment, for | | = 1 B 
Seneral mean 1 | 9 4 "i E: a-l AB 
Blocks die E " = me n * Subplot error 
Whole-plot error | (a—1)(e-1) | ss Bsp | ^P» 
V NEN EF — b 
E ac Total 
Total ac 4 Í 


). Therefore the F test for 


# is 4(01— 73 
o3. Under the null 


The ¢ x : pplot units, E 
thesis o? = g$, and the usual normality assumption? 
mp 
h TEC HgsÉpp 5? 
asa p distribution with 1 and m degrees of freedom where 
m= (a— 1) (c-1)— l 
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3. CONFIDENCE LIMITS FOR o2/o2 


: i n. 
Confidence limits for o3/o03 can be obtained by an extension of the method due to es 
Instead of the covariance analysis of S and D, consider the similar covariance analysis 


S'z Vus | Vi 


93 Gi 
D' = Yük Wir 
and E Op 


" ] ’ D^ 
Let the error sums of squares and products be denoted by Ess, Esp and Ep. S es of 
are uncorrelated and therefore the variance ratio for testing the regression coefficien 


$ in 
S' on D' against zero has a F distribution with 1 and m degrees of freedom, where m is agat 
ac—a—c. Thus mds 


1) 
(en MA 
Tiin- E; ~ " 

has a ¢ distribution with m degrees of freedom. 


Now 


and therefore 


,_S+D 8-D 
uni UE cu 


Ele = BsstBpp+ sy | Bes +E pp — 2E. 
ss = 


ss Ep EE À p rU Es) 
40% 403 20,0, ^ 4g3"4g3' 20103 
where 4E, and 1E, are the error sums of Squares and 3r /(H, E.) is the error sum of products 
in the covariance analysis of y,,,, and Yeon 


[Of course E, E, and r could be obtained direct] 
Vi», but equality of oj and o$ could not be tested by 
ratio because of the correlation between the 


ned more readily by ek 
Yarn and taking E, Ej and r from this. 


a E E 
d muc PM 
an Esp 40? 4g 
By substitution in (1) 
E, E, 
T ee 


a =) 
If the a percentage point of the t distribution with m degrees of freedom is written 7 the? 
7 1—-—a. 


al result that 


Pr(—r<te< +7) 
Rearrangement of the inequalities leads to the fin: 


E 
Pg emeret fruc ao n] 


-1-a, 
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X(1—r?)7? 
2, 


where Rai 
m 


E, = Esst+Epp+ 2E sp, 


E, = Ess+Epp— 2H gp: 


(Ess s Epp) 


and ni aces 
(Ess - Epp) — AES» 


m = (a—1)(e-1)-1, 


and T = a percentage point of the ¢ distribution with m degrees of freedom. 


4, OTHER SPLIT-PLOT DESIGNS 
If the whole-plot design is more complicated than simple randomized blocks (e.g. a Latin 
square), an exactly analogous covariance analysis to that given above will lead to the test 


of equality for the subplot variances and to the confidence limits for their ratio. 


A further application is to Graeco-Latin squares in which the Greek and Latin letters 
applied separately to the two halves of each 


represent two different sets of treatments, à 
whole-plot (Yates, 1937, § 16i). The errors associated with the two particular treatments 
Occurring together in any whole-plot may be correlated and have different variances. 


Extending the analyses of variances of sums and differences (Yates, 1937) into a covariance 
analysis, we have, for a px P square, Table 2. 


variance of sums and differences for a Graeco-Latin square 


Table 2. Analysis of co 
[| | 
Sums of squares | 
nd products [e | 
Components fo; arte ue breed 
dnd) r d.f. | analysis of D 
s: | sD | D 
Á EO wre 1 Latin v. Greek 
Adjustment for l idi z 
Beneral mean = = p-1 Latin treatments 
a treatments p-l = a = p-l Greek treatments 
reek treatments p-1 > = p-l l 
A n p-1 emt a = p-1 j Subplot error 
umns p-l = | 5 (p-1(»-93) 
D 
ole-plot error (p— 10-3 Hiss | Feo i 
p Total 
Total p 


square would follow the usual 


more than one 
nce limits proceed exactly as 


e analysis to 
determination of confide 


lot design. 


" The extension of the covariane 
roe The tests of significance and 
Ove for the simpler type of split-P 
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A 5. EXAMPLE 

Tn an experiment to compare the yields of six strains of cocksfoot (Dactylis glomerata), m 
strains were allocated at random to six whole-plots in each of four replicates. Each who. 3 
plot was split into two for the comparison of high and low levels of fertilizer. Yields of g 
from the first cut were about twice as much at the high level of fertilizer as at the low, anc 
it was therefore suspected that the error variances for these two levels might be different. 

The analysis of covariance is given in Table 3 and the test for the regression of S on D 
in Table 4. The value for F shows that o? and 9$ differ significantly at the 5 % level. 


Table 3. Analysis of covariance of S and D 


2n I 
] I 
| Sums of squares and produets | 
Components for | | Components for 
| analysis of S df. | j -| d£. analysis of D 
| S: SD | D? | 
= = 5 | BE Jj 
Adjustment for | 
general mean l| 293,594 128,418 | 56,170 l . Fertilizer 
Strains 5 | 30,784 — 2,152 18,399 5 | Interaction 
Replicates 3 | 16,38 922 585 | 8| 
- d | 3 ror 
Whole-plot error | 15 37,618=Ess | 19817— Ej | 39,330— Epj | j5f | Subplot er 
išo — — i e LR = ] - —. 
Total 24 | 378,034 147,005 | 114,484 24 | Total 
| 
i 
Table 4. Test for regression of Son D 
| j E | d 
| d£. 8.8. M.S, r 
I 
Regression of S on D 1 9,985 
T 9,9 06 
Residual 14 | 27,633 1 RA | yi 
, 
| | pem cs | eil 
Whole-plot error 15 | 37,618 
Now E, = Ess +Epp+ 2Esp = 116,582, 


E, = Eşs+Epp— 2E 


Sp= 37,314, 
B/E, = 3-1244, 


Esy—E y 
v2 — ( SS DD) _ " 
— EE, = 90007. 
For 95 % confidence limits, 7 is the 5 % point of the t distribution with m = 14 d.f. Thus 
7 = 2:145 giving K = 1-657 and V(K?~1) = 1.321 Therefore er 
A 959 ts 
04/03 are (1-05, 9-30), % confidence limi 
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Standard errors for various treatment comparisons. ej 

Let E, and E, be the whole-plot and subplot error mean squares based on 15 and 18 d.f., 
respectively. Then, if the error variances for the two subplot treatments are assumed 
equal, the standard errors for the various treatment comparisons are.as follows: : 

Difference between two strain means J(1Z,) = 25-0. Tn : ' 

Difference between the two fertilizer means J(E;j/12) = 13-6. ` ‘ 

Difference between the two fertilizer means for a particular strain /(}H,) = 33:3. 

Difference between two strain means at the same fertilizer level./[}(2,+ E;)] = 34-4. 

If the subplot error variances are different, the last standard error needs to be recon- 
sidered. The separate analysis of the two levels leads to exact ¢ tests with 15d.f. and the 


Standard errors are, for the high level, 
Ass Epp 2Esp) _ 44.1, 
2415 
and for the low level 


Es Epp - 2Esp) _ 24.9, 
2415 


. More precise estimates of the standard errors are 


* _ 2px 
zG im) =41:9 and cca ee = 24-6, 

2 15 18 e 

Where Eh, and Ef are the error sum of squares and sum of products for the differences 
based on all 18d f The cost of including the extra 3d.f. is that exact £ tests are no longer 


Possible, 


Tam indebted to Mr G. J. F. Copeman of the North of Scotland College of Agriculture for 
Permission to use part of the results of an experiment on strains of cocksfoot in the numerical 


example, 
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A MAXIMUM-MINIMUM PROBLEM RELATED TO STATISTICAL 
DISTRIBUTIONS IN TWO DIMENSIONS 


By A. J. HARRIS 
Road Research Laboratory, Department of Scientific and Industrial Research 


1. STATEMENT OF PROBLEM 

We consider the following problem, and some developments of it. In its finite form it may 

bestated thus. A massis distributed among the hk 

into 4 rows and k columns. The content of each r 

contents of the elementary cells are unknown. Wh 
of any given group of cells? 

In the infinite form we have a density distribution S(x,y) 


0<y<b. The group of cells is replaced by a given area which 
remainder of the rectangle being R2. 


finding the minimum content of R2 w 


cells ofa rectangle which has been divided 
ow and of each column is known, but = 
atis the least (or greatest) possible conten 


over the rectangle 0 € v € €; 
we call Region 1 or 41, the 
Since the maximum content of R1 can be found by 
e confine ourselves to minima. We require m, where 


m= min f(x,y) dx dy (1) 
ni 
subject to f(x,y) 2 0, 
b fx 
i f rendei = at, 2) 
0/0 
Vv fa 
| [te nazay = ny, 
0/0 
where A(x) and B(y) are given monotonic functions in 


creasing from O to A(a) = B(b) = MN, 


N being the total mass. It ma ume that A(x) and B(y) possess finite 


y also be necessary to ass 


say m’, then Hmi (3) 
It will be proved that the equality sign holds. 
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The minimum content of a region equals the minimum content of the enclosed rectangle 
of greatest minimum content. In this connexion the word rectangle stands not only for 
a compact rectangle, but for any area which can be formed into a compact rectangle by 
& rearrangement of rows and columns. Such a rearrangement has no effect on the problem 
at least in the finite form. The usefulness of this result lies in the fact that the erint, 
content of a rectangle can be immediately written down, and that for simple forms of R 1 the 
rectangle of greatest minimum content can easily be discovered. It is not necessary to find 
even one of the distributions of mass which actually satisfy the minimum condition. 


3. PROOFS 


The proofs which follow are complete for the finite form of the problem, both for unrestricted 
and for integral values of the cell contents. The infinite form has been proved subject to 
4 restriction which involves both the nature of the area R 1 and the marginal distributions 
A(x) and By). Some developments of the problem are discussed: they include the minimum 
Content of an area contained in R1 when the content of R 1 is maintained at its minimum. 


(i) Minimum content of a rectangle 

s are denoted by L, horizontal lines by M, and rectangular 
whose minimum is required, is the rectangle 
that the cell at P in R1 is not empty and 


z Consider Fig. 1. Vertical line: 
reas by the lines which bound them. Region 1, 


te T4 MyM, in the top left-hand corner. Suppose tha RU, a 
hat a cell in Qin L,LM, M is also not empty. Then without violating the marginal conditions 


e can transfer some mass from P to P provided we transfer an equal amount from QtoQ'. 

his move reduces the content of R1. The process can be continued until either R1 or 
“LM, M becomes empty. If R1 empties, its minimum content is clearly zero. If L, LM, M 
empties, the content of Z4 LM, M, is N — (2) that honor seb ~ AQ) dade 
Wuently the content of R1 ds A(x) + Bly) - N. Itis easily seen that this is the least possible 
Content of p mmis Am content requires à negative content in the diagonally opposite 


Tectangle L, LM, M. The result may be summed up as follows. If 


A(x) +By)-N>° (4a) 
_ N. and the necessary and sufficient condition that the 


alue is that the diagonally opposite rectangle L, LM, M 
N«0 (40) 


the min; 

con gunimum content is A(x) +BY) 

Sho ent should have this minimum v 

5 uld be empty. If Alo) + BW)- 

en the minimum content is zero. 

contents not restricted to integral values 

may be disregarded without affecting the 

been done, so that no row or column is 
loy in these proofs in 

p to the processes we emp 
et it is essential to the present proof that we do 
a. D 


visualize the problem as one 
ot Testrict ourselves to integral values of the cell pecie Sce icis Lid iia 
arranging on the rectangle an immensely large number 0 
ENT J 


j R1 as a I-space, of R2 as 
9quaj almost infinitesimal mass. We shall refer b z n es apad en mi 
t Space, In like manner a ball in a l-space 15 icon 7 


(ii) Finite problem, general R1, cell 


Y row or column which is completely empty 


r s 
Problem, It is convenient to assume that this has 


Co: 
Mpletely empty. Itis convenient 


easily picturable way. Therefore, 


:ndivi i do not 
pferring tO individual balls for example, we 
‘Nguage of the integral problem, referring es 
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assume any definite number of balls and may change the number in the course of the 
M cem Maps solution of the minimum problem has been found. Certain re 
contain balls, certain spaces are empty. If another solution of the problem is Im 
a superposition of the two solutions is also a solution (the number of balls is di oe € 
course in the three solutions, but as already explained this is irrelevant). The new solu M 
has the property that any cell which contained a ball in either of the first two solutions = > 
contain a ball in the new solution. By superposing all possible solutions we therefore obta! 


i ae ion 
a solution in which any space which can contain a ball does contain one. This is the solutio: 
with which we conduct the argument. 


B (y) 


N-80) 


A (x) N—A(x) 
Fig. 1. Rectangular region 1. 


to the corresponding position Pa wall 
hanged so that the boundary conditions are 897 
1-ball has become a, 2-ball. The move from P to 


: ge and may even increase it, therefore the minimum content ° 
Region 1 has been reduced. Since this is im 


possible, there cannot have been a 1-ball in th? 

row containing Q'. 

We now carry out a rearrangement of the rows an 
a 1-ball are collected at the top; all columns contai: 
Fig. 2 this rearrangement defines the lines 7, 
the left of L, contains a 1-ball. Because o 
working (every space filled which can be) there is no solution which has a 1-ball outside i 
rectangle Ly L, M,M,. It is easily seen that LL M, 1 Consists entirely of 1-space. If an. 
elementary rectangle within it Were a 2-space we should have the deviation which in jo 
previous paragraph was proved impossible: a 2-space Whose intersecting row and colu! 
both contain a 1-ball. S 


, ain 
d columns. AIl rows which cont 


ning a 1-ball at the left-hand side- 
1 and M, an 
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To indicate in Fig. 2 that every row within a certain area contains a ball, we mark in the 
rows and put a dot, representing a ball, at the end of each row. The rectangle L, L, M, M; has 
both rows and columns marked in this way because both rows and columns contain a 1-ball. 
To indicate in a similar way the presence of an elementary 1-space or 2-space we use the 
same convention but replace the dot by a small circle. 

We next carry out a rearrangement of the rows below M, and the columns to the right of 
L,. We collect at the bottom those rows which have a 2-space to the left of Z,, thus obtaining 
the dividing line M,. Every row in L4 M, M therefore contains a 2-space, as shown. In 
a similar way we obtain Lp, so that each column in L, LM, M, contains a 2-space. The regions 
LL, M, M, and L, L, MM, consist, like L L, M, lM, entirely of 1-space since not to do so 
would contradict the principle by which Z, and M, were defined. Since these regions are 
entirely 1-space, yet lie outside the region occupied by 1-balls, they must also be empty. 


o 


Fig. 2. Solution, finite form. 


M, which will be proved empty. Suppose that it is not 
ball. Then since A lies below My, there is a 2-space at 
L,. Again, since every column in 
Eus i as A and to the left of L, 
L MA. : a weder hae must be one at some space C.Ifwe move the ball from A to 
land Rd pion n = C to D we satisfy the boundary conditions. But the arrival 
m : 
of a Pu at as bn y one BD is a l-space we have a solution of the problem with a I-ball 
Outside the Bota d L M M, within which all 1-balls must be found. If D is a 2-space we 
àve reduc id then i sei conten of Region 1. Since both these “ae are impossible the 
e ni i g ae ; 
Tectangle L,LM,M must be empty. Similarly Ta DM M is empty. The empty regions axe 
8 H 
‘ded diagonally in Fig. 2. 
mi Onsider now the rectangle o L4 My 
qi, um content of Region 1, sinc 
L ^gonally opposite rectangle L,LM, am 
$ LM, y, is at its minimum. Thus ™, the e ea iig ertisdy M esie 
m content of L L, M, My. If there were ano oy e 
i minimum content m exceeded that of L Lı Ms Ms, we 8 i 
m >m. ( 


Consider now the rectangle L4 2M 
empty and that element A contains a 2- 


i irely of 1- Its content is the 

nsists entirely of 1-space. c 
Wa E all the l-balls and nothing else. The 
$m and this implies that the content of 


i i t E 
pw M content of Region 1, equals the mini- 


^ 
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But since this rectang e lies within Region 1 it cannot have a minimum exceeding that of 
s m xm. (6) 


- T m; 
Since (5) and (6) are incompatible there is no such rectangle whose minimum m nae 
i d words L,L4 M, M, is a rectangle of greatest minimum content in Region 1. We 
in 2 
i L, M W. 

rove exactly the same thing of Ly L, MM, M 
P We have therefore shown that the minimum content of Region 1 equals the jurare 
content of a rectangle which lies entirely within Region 1 and has the greatest minim 
content of all such rectangles. 


When there is a unique rectangle of greatest minimum content, L 


; and L coincide and so 
do M, and M. 


: it 
Since we are assuming that there are no completely empty rows or columns 


: A 2 af 
is easily seen that this is the only possible arrangement compatible with the existence 
a unique rectangle. 


lo 


immediately to the left of Ly, 
are empty spaces. Thus LL atest minimum content, and it pas 
a corner on the boundary curve. If DEE'F i 

a rectangle is A(x) + B(g(z))—N, 
complicated boundaries the rec 
expressed as a function of x in 


1M M, is a rectangle of gre: 


a similar fashion, 


(ii) Finite problem, general R 1, cell contents restricted to integral values 


The proof employs a process of rearrangement somew 
proof but more elaborate. We now hi 


said 
hat similar to that of the previ? d 
ave only N balls, 


er 
If we have two solutions, the SUP 


a Ux o 


ro 
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position of these clearly does not produce a solution of the problem of N balls. It may not 
even be a solution of the problem of 2N balls allowing half integral values. 

We assume in this case that we have found one integral solution of the problem. In Fig. 4 
we collect as before into the regions MM, and L, L, those rows and columns which contain 
a 1-ball. It is clear, as before, that Ly L, M, M; must consist entirely of 1-space, and that any 
balls outside this space at present must be 2-balls. Within LL, 1, M, each row or column 


contains somewhere a 1-ball. This is indicated by the same convention as in Fig. 2. 


Mg 

WW" 
id zR "A 
^ eK "A 


Fig. 4. Integral values only. 


and columns. Consider the columns Z4. All rows 
d to the bottom ofthe diagram, thusforming 
L. Asindicated in the figure each row 
LM, M,. The rectangles LL; M, M, 
ty. We may now prove that 


E now rearrange the remaining rows i 
the aining a 2-space within Lp L, M M are move ; 
in x M, M. In asimilar way we form the group Ze 
oL, M, M contains a 2-space, and each column in Ls 
ond Ly Ly M, M, consist entirely of 1-space and are therefore emp 
2 LM, M is empty. . T wil 
€ show first that it is possible to rearrange the balls in LI, M ,soastoo tain a ballin 
any desired cell. If the cell is empty the row and column which pass through it must each 
Contain a 1-ball in some other edil It is possible to remove these two balls and to replace 
em by one in the desired cell and one in the compensating position. Hi the cells concerned 
© Inside Z, Z, M, M, and all are 1-spaces 5° that no es rir D by such a move. 
iw i i i sition within LoM lx 
Suppose Pert p7 paese ae contains à shall at A. Then there must be 
“Paces at B and C in the same row and column as A. Dis the space which completes the 


poetan t D. The balls at A and D may now be replaced 
b a al vr e being à reduction in the 1-balls which are 
an , the net r 


i x 4. 
ET at their minimum. This is ım 
"oM ed next the remaining space OF" 
Y in this rectangle we move to the right to 


possible, s 
of the rows y 
join up 
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i ^ ing dividing line 
ividing line between empty and filled columns is Z4. The corr esponding M iding 

epis M,. We may now prove = iia M, J^ tegen dea ura PRA 

2-space. The row containing ain a 2 R 
MM : Hiara = by ps in iim is below a 2-space at say R, which again is level bh d 
~ ball d some point S in Lọ L, WM. We move the ball from Q to R and ar Jar 2j 
Fia T. Since T isin 1-space the ball at T remains a 1-ball. In a similar way, using ie ae 
Q' e. R and S’ to T" we get another 1-ball at T". But the balls at T and 7" may ber iem 
by a 2-ball at P and a 1-ball in the compensating position. By this last move the dm s 
is reduced. This is impossible, therefore P cannot be a 2-space. The only case in whi "d 
argument breaks down is that in which S' coincides with S. When a ball has ses I phi 
from S to T there may not in this case be another one to move to T". But, instead of m g 


à e 
a second ball from S to 7” it is permissible to move the ball at 7 to P, and to obtain as befor 
the impossible reduction of the minimum. 


" 9. ce 
We now proceed as before; collecting between M, and M, those rows which have a 2-spa 


j and 
within L, L, M, M,. In a similar way we define L,. As before it follows that L, Ly M, My ey 
LL, M, M, are entirely 1-space and therefore empty. The rectangle L, L, MW, M, may 
proved empty, and then L; and M; 


are found as before by collecting the filled columns = 
L L, M, M, to the left, and the filled rows of L, L, M, M, to the top. This process is ponten 
until it comes to a stop. At each stage it is possible to prove that the newly found diagon x 
rectangle is entirely 1-space if it occurs towards the top left-hand corner of the a ak 
empty if it occurs towards the bottom right-hand corner. As an example we now prove th 
LoL, M M, is empty. 


Tf there is a 2-ball at some point E, then it must be ich 
at F, and to find another 2-ball at G which may be moved up to H, and a third at J bean if 
may be moved up to K. This multiple move may be compensated by moving a 1-ball fr » 
some point U to V. Now level with V there is the 2-space F”, below which is a 2-ball at " ; 
We move this to H’, then the ball from J’ to K', and compensate the combined move s) 
moving a 1-ball from U’ to V’. As there are now 1-balls at V and at V’ we can replace em 
by a 2-ball at F’ and a 1-ball at the appropriate point in LaL, MM. In this last AE. x 
isi , the argument breaks down if K li le 
e impossible move is to move V directly to F’. The rectang 
Le L, M, M, is therefore empt; 


The process which we ha 
column: 


H i ce 
possible to move it up to a 2-8p9 


those which are entirely 1 


rom 
ps containing 2-spacefro 
kind. In Fig. 4theend is re. 


emaining strips are of p^ 

-balls are found in Li I, Ms i 
complete row or column of the fig" 

is empty the rectangle L; L, M, M. 


esM must contain a 2-ball i 
The final result in Fig. 4 is 
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The process by which Fig. 4is built up can end in a number of different ways. An examina- 
tion of the various possibilities shows that when it becomes impossible to subdivide the 
remaining sets of rows or columns the 1-balls are always found to be contained in a rectangle 
of greatest minimum content consisting entirely of 1-space. The theorem is therefore true in 


all cases. 
(iv) Infinite problem, general R1 


e the theorem when the rows and columns are not given, and the 
y, leads to difficulties connected with the nature of the 
and 2. These difficulties may be avoided by restricting 


The attempt to prov 
rectangle may be divided up arbitraril 
division of the rectangle into Regions 1 
Ourselves to divisions which satisfy the condition explained below. 

If we divide the figure into an arbitrary number of rows and columns each elementary cell 
80 produced must be assignable to one of three groups: Group A, those cells consisting 
entirely of 1-space except perhaps for points on their boundaries; Group B, those consisting 
partly of 1-space and partly of 2-space; Group C, those consisting entirely of 2-space, except 
Perhaps for points on their boundaries. The area formed by the cells of Group A we shall call 
Region 1', and the area formed by Groups A and B together we shall call Region-1”. These 
regions are groups of cells as in the finit 


Wenow assume that the minimum con 
for Regions 1’ and 1" differ by à quantity 


e problem. 
tents of the rectangles of greatest minimum content 


¢ which can be made as small as we please by 


increasing the number of rows and columns. Tt seems likely that in most practical cases, 
where the subdivision is by means of one or more simple curves, and A(x) and B(y) possess 
finite derivatives at every point, it would be possible to show that this condition was fulfilled. 
If the derivatives exist, no line parallel to a side of the rectangle can carry a finite quantity 
of mass, and any finite number of such lines may be added to or deleted from an area without 
effect, 

As a first step in proving the infinite cas 
rows and columns as just described and co 
and 1”, Since the rows and columns are now given and th 


can construct equi i blems by replacing th 
quivalent finite pro j ; 
© finite number of conditions which specify the contents of these rows and columns. We 


know valent problems. We have still to prove, however, 
tl i the equivalent pr ) 
iet enan findeolutions dE Ea be so arranged within the rows and columns 


hat the conten he rows and columns can 3 : as i 
t lemselves t ee solution is also & solution in the infinite Fk e is ie for 
* Subdivision into any other set of rows and columns. In a ei s à that ips k ibus 
. €Blon 1’, where each cell belongs completely to one region or the ot ss = "v 4 culty 
in tearranging the content of an elementary rectangle in any ed wep kn aos 7 Ex 

at in such a case the contents of the rows and columns may be peat as ed “a à ie "à y 
: ne infinite boundary conditions also. Therefore the minimum con en r Region l'or 


Ew : sexes 
Y treating it as a problem in finite form is also the minimum content for Region 1’ or 1 


o 
" the problem in infinite form. 
et the minimum content of Region 


e we divide up our rectangle into a finite number of 
nsider the minimum contents of the Regions 1’ 
e regions are of the correct type we 
e infinite boundary conditions by 


1 subject to the infinite boundary conditions be m. 
A ; i minimum content m. 
E must exist some rectangle J^ gnis e a ER Peon Now the rectangles of 

Xceeds o , rectangle contain . 
; r equals that of any other rectang "ER 
Steatest ete UE Regions 1’ an a pro Pd. E 
u i 
j "antity c. Let these rectangles be denoted by £ end ita 
iL and m” 
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" y " 

i i sor contained 

Now, apart from points confined to boundaries of rows or columns, Region 1' is 2 
in Region 1, which is contained in Region 1"; therefore their minimum contents unde 

infinite conditions are in ascending order, 


3 
7 
m'<m<m". (7) 


: ; P : P ales iom 
But since Region 1’ is contained in Region 1 the rectangle of greatest minimum A s 
Region 1’ cannot have a greater content than the corresponding rectangle of Region 1, a 
this in turn cannot have a greater content than Region 1 itself, therefore 


8 
m'«mpyxm. (8) 


Remembering that m" exceeds m^ by less than e we may write (7) and (8) as 


9 
m <mMp<m<m' +e. (9) 


Since mp and m are quantities inde 


-— A nd 
pendent of the subdivisions which give m’ and e, an 
Since c can be made as small as we pl 


ease, we must have 


Mp=m (10) 


which proves that, in this case also, the minimum content of R 


egion 1 equals that of the 
rectangle of greatest minimum content within it. 


(v) Minimum within a minimum, finite non-integral problem 


Suppose that after the rectangle has been divided into Regions 1 and 2 we subdivide 
Region 1 into Regions 1, and 1,, and require the minimum for Region 1, subject to oy 
boundary conditions and to the extra condition that the conten 
atits minimum. For brevity we shall refer to this as the conditio 


We assume that the subdivision is such that any cell of Region 1 belongs completely either 
to Region 1, or to Region 1,. 


t of Region 1 is posae 
nal minimum for Region tr 


InFig.5therowsand columns have first been arranged as in Fig. 2 in findin gthe minimum 
of Region 1; the dividing line: i ent being lettered as before L4, Lo, M, and My 
Regions shaded at 45? from top right to bottom left are empty because of the minimu?? 
WZ, which form part of the boundary of pe 
may, of course, be other empty regions whic d 
ir positions are unknown; elements of Region 1 which b 


outside the rectangle L,L, M M, are empty an Some of these to the right of X 


and WZ, 

It is convenient in st; 
Region 1 
we shall 


d there may be 


1 to confine it to those parts E: 
akes no difference to the MEE 
Region 1, which lies outside t > 
ntent of Region 1, must therefo" 


ı MM, as we did for the first minim 
a 1,-ball are collected into M, 


within LoL; M, M,; columns L,L, similarly have a 1 


e 
M; and LL. Rows JM, M, are those which have a 1,-5P2^ 
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á , 
E inem of the rows and columns which pass through the rectangle LL, M; M,. Those 
s i re empty within this rectangle are grouped below JM; and to the right of Ly. It cannot 
med that every row and column in L, L, M, M, is filled, because, dlihoagh we assume 
ty t8 


Wi GGA 
N S WA a 
5 RNS NN 
BA SSNS I WN NN 
Mı Ve, EN 
» BZ 
» 1, MN 
a MA uae 
e UBU, 
AANS E 


A3 


A 


83 
Fig. 5. Minimum within minimum. 


f step curve. 


Fig. 6. Formation 9 
uh Complete row or column to be filled, there are other spaces available besides those 

ch lie with; 
within the rectangle Ly Ly, Ms Ms. following Way; illustrated. separately in Fig. 6. 


9 Now rea ‘MM, M in the 
al aude the is Pa nd ses m left Ifany elements of it are d pM 
st column on . The column will then be divided at A into an upper 


© rows which contain these 2-spaces. 
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+ consisting entirely of 1,-space and a lower part consisting entirely of awe We -—— 
conie th Md with the second column and the rows above A, that is, we EPONE: ie E 
m bran ht down. The process is continued in this way until we reach L, in Fig. B e 
ios nat process produces a step curve ABCDEFG in which the elements forming xis 
dida face of each step are entirely 2-space while the space above the step curve is ee 
1,-space. As we have pointed out earlier the space below the step curve is not nec 
2- e. i 
g Tus to be noted that if we select any group of columns, say those mage C e = 
Fig. 6, and change the order of the columns we shall get a different step curve. Nevertheless, 


; isht 
it will begin at C and end at G as before. Every row which makes up the vertical heigh 


B . : rearrange- 
between C and G contains a 2-space, every row above G is entirely 1,-space. No rea 5 


A i : the 
ment of the columns between C and G can alter this fact so that in any arrangement 
steps must rise from C to G within this group of columns. 


i w 

In Fig. 5 the-step curves X Y and WZ are constructed in this way. Since every row pn 

M, contains by definition at least one 2-space to the left of L,, the X Y curve reaches de 
M, at some point to the left of Lı. Similarly W is a point on L, above M,. We now see V 


e 
the spaces to the left of X Y and WZ may be shaded to indicate that they are empty becaus 
of the first minimum: they 


are 1-space and outside the rectangle Ly L, MW. M. 
Where Ls, L,and L intersect the step curve X Y wedraw the horizontal lines My, M; and se 
Strictly speaking, it is the intersection of the column immediately to the left of the Z li e 
which determines the intersection; e.g. where L; falls exactly on the edge of a step we mer, 
take the curve on the left to determine M s. The lines M;, M, and M; determine Le, L; and Zs 
in a similar way. hat 
We may now show that certain areas in Fig. 5 are empty. We have already noted t 1 
rectangle LL, M, M, has a 1,-ball in every row and column, and that rectangle Lg Lig Ma's 
has a 1,-ball in every row and column. Rectangle L,L,M,M, has a 1,-space in each TOW; me 
L, L, MM, a 1,-space in each column. These are all indicated by the conventions of Fig- ? 
The presence of 2-spaces in the risers of the ste 
By arguments similar to those used for the 
Lo L, M, M, consist entirely of 1, 
The rectangles L,L4 M, M, and L 
to perform impossible operations. 
a ball in the first of these may be 


compensated by removing a 1,-ball from LL, MM, to the right. This is the same sor ? 
move as that which showed that L, L, M, M, 


to be empty by exactly the arguments used in finding the first minimum are shaded in Fig: 
at 45° from top left to bottom right. 


s d 
Now consider L, L, M, M,. If thisis not empty let there be a 2-ball at A. This may be move 
sideways to a 2-space at B, and there must 


; Whatever its nature, may be 
a 1,-ball at F moved down to G, remaining a 1, 


of moves is that both minima are unaffected, 


definition it cannot occupy. Therefore, the move is impossible, and L, L, M, M; must 
empty. Similarly L, Lẹ M, M, is empty. 
Tt would be convenient to be able 


to prove that L5 L, M, M, 
order to get a completely empty recta: 


ty P 
1 or L, L, M, M, was emp 
ngle with an upper left- 


ver 
hand corner on the step €U* 
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en because there are no 1,-balls above the step curve between L; and L,. We 
a pem C next few paragraphs, however, that, if neither of these rectangles is empty, 
rude e : b wes the columns LL, and rows M, M, and to find a line Ly lying 
dre 5 and L, whose intersection with the rearranged step curve defines a line Mo for 
meth 2 r bs oe L,LMyy M is empty. A similar process applied to the rows M, M, and 
ele S L.L, gives us the lines JM, and L4, and an empty rectangle Z; LM, M. The last 

ments of these empty regions are shaded steeply from top left to bottom right in Fig. 5. 
h 


u— 


B À 


T 12222227 
A 
SESS 


between Ls and Z4. In LL; JM, M, every row 
that the step curve rises to M within 


Pig. 7 shows an enlarged version of the region 


Contain, fact 
S a 2-space which is indicated by the fact t 
© rectangle, Ms next inquire whether the lower right-hand rectangle defined by L; M, is 


~ pty or not. In Fig. 7 we have assumed that itis not empty, therefore some of the columns 


LD, lumns to the left thus defining their 
L4 M, hese filled colt 
4M, M, must be filled. We move thes! p M, is therefore empty. We nextinquire 


ndary lin T. Ja 

e L.. The remaining rectang'e “11 ; 
“ther there bed any rows containing 2-spaces in Dj Py M; Ms. If there are none we shall 
p Ye achieved our goal, since the step curve will then proceed horizontally along JM, to its 
ig ection with D and we shall have an empty bottom right-hand rectangle L4, LM, M 
th its corner on tho step curve T£, however, there are t T z mee 

$ A . Between L; an e step 
9 M at the bottom, thus defining their upper boundary Mir 5 1 


© must therefore rise to Mr it mi 
9 May now prove that L L MM; is empty: E ap ans CER io ane " 
ved Sidewa; si B below which there 18 a 2-ball a’ e ge E bein 
= ys to a 2-space at, pall from Z to F, is exactly like the move 
Cto 7, Pace at D. The compensating ™ «sibility if continued upwards. 


i ove, à la Mein 
In Fig. 5, and may similarly be shown to lead to an imp 
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We then inquire whether the rectangle L,, L, M, M, is empty. Ifit is we have Ee d 
goal, but if it is not we collect the filled columns to the left as before and define he ctm e 
Peo this point on we repeat the process. At each stage we inquire whether the row: w^ 
contain only l-space or whether the columns on the right are empty. If the iae 2 
affirmative we have reached our goal, if negative, we can reduce the remaining "pe oo 
rows or columns. The process must come to an end because the number of rows and co. " E 
is finite. It can end in one of three ways, (1)in giving the sought for result, (2) because 2 7 
remaining rows up to Jf, contain 2-space, (3) because all the remaining columns up to £4 

contain 2-balls. If (2) or (3) occurs we can prove a contradiction. di 
Fig. 7 illustrates case (2). All rows in Lj Lj5M,M,, contain 2-spaces. By the typ 
argument just applied to the ABCD move we can sh: 


ow that L, L, M, M,, must be empty- 
But this contradicts our initial assumption that L, L, M, M, is not empty. 


5 rom 
Tf case (3) occursit may be shown that the next step results in case (2). It may be ME We 
Fig. 7 that if when collecting the filled columns in the rectangle L,, LM, M; every co ae 
has been found to contain a 2-ball the next step would have been case (2), since the ro 


: is 
containing 2-spaces (or in other words the step curve) must reach up to M, before Ly 
reached; see step curve in Fig. 5. 


Thus, we may conclude that either L.L, 
described above must at some stage produce 
the step curve and have an empty lower rigl 


MyM, or L, L, M, M, is empty or the RS 
the lines Zy and M, (Fig. 5) which intersect E: 
ht quadrant. Similarly, there exist the lines 4 
and L4 with the corresponding properties for step curve WZ. a 
Consider now certain areas of Fig. 5. These are indicated by the lines A,, Ay, As, 4, draw’ 
below and alongside the rectangle. The rect; 

Lola M M, LyL,M, My, LEMM, and Ly 
either of 1 


gion 1, (i.e. subject je 
m content of a suse 
g to Region 1, the space which is nec? ^ 
e minimum of Region 1. The same thing is true of the rectang 
defined by the B lines in Fig. 5. 

Let the conditional minimum of 
covered be denoted b 
denoted by Region a 


Since Region (1 


Region 1, be denoted by m. Let the rectangle just e 
y R; its unconditional minimur 


n is also m, Let the enlarged regio? 
17- 0) and let its unconditional minimum content be m. 


: H f 
1+0) is empty apart from Region lı, its content is also the content ° 
Region 1, which is m. Since no content can be less th. 


an the minimum content m’ we hav? 

mam. a 1) 

But m is also the minimum content of rectangle R which lies entirely within the Regio” 
(1, + 0) and cannot therefore exceed the minimum for that region, therefore 

mam. ( 12) 

m= m. (13) 


It is also evident that R must bea rectangle of greatest minimum content for Region (11+ o) 


Thus finally 


| 
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sons we have shown that the conditional minimum of Region 1, is the unconditional 
inimum content of the enlarged Region (1,+ 0). 


(vi) Minimum within minimum: finite integral problem 


A proof can be constructed by combining the procedures of Figs. 4 and 5. 


(vii) Minimum within minimum: infinite problem 
h No proof on the lines of that given for a single minimum has been found. It seems likely, 
owever : : ONE 
ever, that the theorem is true for most simple subdivisions of the rectangle. 


(viii) Extension to a third minimum. 

Ae the division into regions is by means of simple and similar monotonic curves as in 
DEF. sortie of the difficulties of Fig. 7 do not arise. In Fig. 3 Region 1 is all the space above 
NPO Region 1, is the space above GHJ, and the third nimun region is the space above 
E In this simple case it may be shown that the minimum for the space above N.PQ, 
$ m to the two preceding minima, Is given by arule similar to that found for two minima. 
ped oe case empty space to be added to form the enlarged region is that which is necessarily 
aun T because of the first two minima. This suggests that the theorem may be generalized to 
urther minima. It may well be valid for subdivisions other than those by similar 


Ini š 
Onotonic curves. 
(ix) Further generalizations 


r It might be expected that similar theorems would hold in more than two dimensions, the 
e z : : ' 
ctangle of greatest minimum content being replaced by an n-dimensional rectangular 
Parallelepiped, But this is not so. The minimum content of a region cannot be less than that 

à rectangular parallelepiped contained in it, but in some three-dimensional cases it has 

Cer 
! Shown to be greater. 
4, EXAMPLES 

ibulions on a square 
arity of the infinite problem: the appearance of 


Ine densities on the boundary of Region 1. Consider a unit square whose top left-hand 
“onal half is Region 1. The distributions are uniform, i.e. A(t) = ?; Bly) = y, and the 
Sonal is a Ty L The minimum content of a rectangle with ue corner on the diagonal 1s 
ro, so that the iem content is zero. It iseasily seen that, if the square 1s divided into 


WS and " be distributed uniformly in the elementary half- 
anderes j kpa in ond cabe case this becomes a line density along the 
penal, but from our point of view it must be considered as lying m ood 2. By con- 
R “ring a parallel line just outside the diagonal boundary we may Pr end jap xd 
“egion 1 I J yithout the formation of a line density on the 
dia, may be made as small as we please v 


Sonal, 


(i) Uniform distr 


ex " P å 
; oxtremely simple case illustrates a peculi 


within a circle 


bution. minimum 
ion in x and y, centred on 


ii) Normal probability distri i 
f ; bability distribut: 


a norma] standardized bivariate pro 
SY es d 4 ; Jis given by 
9 7 0, the probability in any strip of width da i$ £I y 
1 m (14) 


dp = Jen) 
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iti : re in fact 
and similarly for dy. Let us suppose that the boundary conditions for our pr oblem are in Y 
given by (14), that is, the distribution is at least compatible with the normal distribution. 
the distribution werein fact normal we know that the prob. 


ability p ina circle of radius r with 
centre at the origin would be given by 


p-l-edn (15) 
" UE NT n- 
We then inquire what is the minimum probability within the circle of radius r which is con 
patible with (14). — — 
The rectangle of greatest minimum content is a square inscribed in the circle. 


halfside of the square is denoted by d, the minimum content m of the square is given by 


a 6 
m= ff e-E* da — 1, (8 
7TJ-a 
Both m and p are 


er 
given in Table 1 for a range of radii. It may be noted that the change oY 
from zero content to positive content takes place where 


17 
r = 0-954, ( ) 


-hont 
i.e. a circle whose radius is almost equal to the standard deviation can be empty bes 
making itimpossible to satisfy the boundary conditions appropriate to a normal distribu 
Minimum within minimum 
Only a single case has been worked out. T 
second for the concentric circle of radius 1 
the inner circlé was 0-233, whereas the un 


wn : . ; the 
he first minimum is for a circle of radius 2 and a 
- It was found that the conditional minimum 
conditional minimum was only 0-041. 


Table 1. Minimum content m of a circle of radius r compared with its content p 
in a normal distribution 


x 
3 
8 
+ 
3 


p 


E: 
8 


02 | 0.0 0-020 | 12i | 0908 | tosis | az 0760 | 0-911 
04 | 0.0 0-077 | L4 | 0.356 


0-625 | 2.4 | 0821 0-944 
06 | 90 0-165 | L6 | 0-484 | 0.799 26 | 0868 | 0-906 
08 | 0.0 0-974 L8 | 0594 | 0802 | 2g 0-905 0-980 
L0 | 0-041 | 0.393 20 | 0-685 | 0865 | 3.9 0-932 0-989 
H H rt 
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FURTHER CONTRIBUTIONS TO MULTIVARIATE 
CONFIDENCE BOUNDS* 


By S. N. ROY axp R. GNANADESIKANT 


Institute of Statistics, University of North Carolina 


SUMMAR Y. In this paper the implications of certain results obtained in earlier papers (Roy & Bose, 
19535; Roy, 1954a,b, 1956) on confidence bounds on parametric functions connected with multi- 
variate normal populations are fully worked out. This leads to a number of confidence bounds, 
expected to be useful, but hitherto unnoticed, on the characteristic roots connected with (i) one popula- 
tion dispersion matrix, (ii) two population dispersion matrices, (iii) the regression matrix of a p set 
on à q set, and (iv) the multivariate linear hypothesis on means, including, in particular, the problem 
of discriminant analysis. Some examples are given in the last section of the paper to illustrate the use 


of the techniques presented in the earlier sections. 


DS ON ROOTS CONNECTED WITH ONE DISPERSION MATRIX 


In conformity with the notation of previous papers, we consider a p-dimensional normal 
Variate with dispersion matrix X and mean vector &, N (5, X). The characteristic roots of X 
are represented by ©. The dispersion matrix in a sample of size (n+ 1) is written S and its 
Characteristic roots represented by 0. The smallest and largest roots we denote respectively 

Y 0, and 0p, adjoining a suffix æ to values between which they lie with probability 1—«, 


Writi " 
iting for example P(0,, «0, «0, < 902 [3)21-2. 


l. CONFIDENCE BOUN 


p= 

ne characteristic roots of a matrix M by c(M) 
by Cmin (M) and Cmax (M). 

exactly equivalent to 


With a slight change in notation, let us denote tl 
Nd the smallest and largest roots respectively 
The statement (3-1-2) given in Roy ( 1954a) is 
n0; (p,n) a/Sa 2 a/Za > n0 (p, n) a'Sa ... (1-1) 

a ? 


f 
9" all non-null a(p x 1)’s, that is, to 
a'Sa. a'a i, 


rau aa 
and n65; (p, n). . 
ae at the second part of the inequality 


ize a’ i tice that the 

`2) impli : ize a'Sa/(a’a), we no 
plies that A,0, < 9,; and choosing a $0 2 firm $ EE 
aig Patt of the inequality implies that 0, < ôr: ice arose ee ea m 
Salaa) ,wenote that the second part of the inequality oe = iege ned "id : siete 

t iiu to maximize a/Za/(a/a), we have that (1:2) implies that v5 5^1» 
Ine me 

qualities (S) 2 Cin E) ees (1:3) 


AL Cmin 2 » 
Ay Cmax (S) 2 Cmax (3) 2 As Cmax ( ) 
1-«. 


2 


a'Sa (1-2) 


a'a’ 


w I— 
here A, and A, stand respectively for nbis 


i dod" ' erve tha’ 
a Oosing a so as to minimize a'Za/(a'a), we observe tn 
stomininr 


e > 
Note that (1-3) has a confidence coefficient 


* 
Thi 
Research research was jointly sp 
“hool o of the Air Research and Deve 
Conomics and Political Science- 
Now With the Procter and Gamble Company- -e 


f Scientific 
; ir Force through the Office of 
onsored by the United se A Foro teh Techniques ai ei 
Jopment Command, 
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j i uing 
Going back to (1-2) let us take a(p x 1) such that the ith component is zero. Then arg 
in a similar manner, we observe that (1-2) implies 


A min (599) > Cmin (ZO) > As esa a (1:4) 
Ay Cmax (S®) 2 Cmax (29) 2 Àa Cmax (S®) 


" , lation 
fori = 1,2,..., p, where S, ZO stand respectively for the ‘truncated sample and popu 


: . : : : take an 
dispersion matrices obtained by cutting out the ith variate. Likewise, if we 
a(p x 1) such that the ith and jth 


i similar 
(ij) components are zero and then argue in à 
manner, we observe that (1-2) also 


implies 

Aa Cin (SP) > es. (DED) > Ae ( paa (1-5) 
Ay Cmax (S6 Dyz Cmax (ZG) > rz Cmax (SÈD) 

for ij = 1,2,...,p, where S D, TED, 

population dispersion matrices obtained 

tinue this process on to the stage of cutti 

variate. It is seen that (1-2) implies a p 


and 
stand respectively, for the truncated mu a 
by cutting outthe ith and jth variates. We ca e 
ng out any (p — 1) variates, that is, np e 
air of statements (1:3), and also p pairs of state 
like (1-4), () pairs of statements like (1:5), 


and so on down to ( P 


) i.e. p statements 
pp 


involving only one variate. 
>l—g, and will provide us, 
‘psychologists call the probl 


jent 
A. fficien 
All such statements will thus have a joint confidence in the 
from a certain standpoint, with a complete analysis of w. 
em of principal components. 


2. CONFIDENCE BOUNDS ON ROOTS CONNECTED WITH 


TWO POPULATION DISPERSION MATRICES and 
Similarly, if we have two p dimensional normal variates with mean vectors É, and aed 
dispersion matrices =, and X, we may write equation (3-2-1) of Roy (1954a), p" 
à= (nafna) 8 (p, m4, 29) and À, = (n1/n;) OZI(P, m4, na), as 

à> all e(S(u')3D,,, nS; UD yy, i) > Àg 
’s (with i = 59 os 


(21) 
where ¥,’s are o(2, Xz 1) ; D). We next recall that 
all non-zero c(A(y x 4) B(q x p)) = the non-zero e(B(q x p) A(p x q)) 


L 
i = AStA: 
4 transformation: 8, = AS# A’ and S; 
Where A is any non singular matrix, Putting A = 


ss 0 
W, we rewrite (2-1), without any loi 
generality, for our purpose, in the canonical form (2:2) 
À> all (SD, ST*D,,,) 2 À,, 

Sn à> all e(S,Sz1 Dis ST Dy) 2A ) 
H rd _ , 2^ a 

" 42 8,85 ta a ($,D,,, S1 ID) a = a'S Sya Ld 
"wa = a'a ES : 


a'a 
’s. Now choosing a so as to maximize the middle term of (2:3) 
thatthe left part o£ the inequality (23) implies that A. 


i to minimize the middle term of 
implies that CEN (Dy, Su 


wonot? 
E Dyni”? 

1©max (S157!) > Cmax (Dus " (29 

(2-3), we note that the right pat 


Dp) > ses (8,87 1). Thus (2-3) implies "m 
A Cmax (S1871) > Cmax (&D,,, ST D,,) > Corin (8, D Sr Dy) > Aria (8,85 - 
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It has been shown in Roy (1956) that 
Cmax (S,Dy,,Sr*Dyy,) > all e(D,,) > Cmin (S Dyy, S1! D,,). (2-5) 


Thus it is seen that (2-3) implies 
A Cmax (S,Sz7) > all (2, 22?) 2 AsCmin (5182 7), (2-6) 


a therefore, is à confidence statement with a confidence coefficient > 1—«, since (2:3) 
A he confidence coefficient 1 —a. (2-6) is derived in a slightly different way in Roy (1956). 
Td enow go back to (2:3) and, as in the previous section, take a(p x 1) such that the ith 
ponent is zero, argue the same way as from (2-3) to (2-6) and end up by observing that 


(2-3) also implies 
A Cmax (SPIP) > all c(Z9E*) > Aamin SPP), (27) 


wh (i) F n s * H 
E ere St , SP, X? and Xf? have the same meaning as in the previous section. Likewise, as in 
€ previous section, we note that (2-3) also implies 


A ess (SEPSE) > all e(2: 22677) > Ao enis (SEPSE, (28) 


and s E : 

"c on till we reach the stage where any (p — 1) variates have been cut out, i.e. any one 

foes e has been retained, which gives us just the confidence bounds on variance ratios in 
nivariate case. We have thus, with a joint confidence coefficient >1—a, & confidence 


St; 
atement (2-6), p confidence statements like (2-7); 6) confidence statements like (2-8), and 


es part of the analysis ofa problem which 


80 o; : . t * . 
n. This again, from a certain standpoint, provid 
tomary variance components analysis in 


Occ . : ; ] . 
b in the multivariate generalization of the cus 
ivari j i i 
ariate analysis of variance and covariance. 


ED WITH THE REGRESSION MATRIX B(p xq) 


3. 
CONFIDENCE BOUNDS ON ROOTS CONNECT 
-VARIATE NORMAL DISTRIBUTION 


OF A p SET ON A g SET IN A (p+9) 
Let the population be denoted 
ME 5 uH Zu =| 
N| aL i qLXi Ps 
1 p. Y 
and the dispersion matrix X, and the g set has mean 


M n 
rotor 5s and dispersion matrix Xs». X12 is the matrix of covariances between the p set and 
Xp. Let the sample dispersion matrix based on 


e q set and we define /(p x4) = " 
i È 
Sample of size (n+ 1) be written S = ke Fal and let B(p xq) = S1, 8a; be the sample 
12 2: 


So 
that the p set has mean vector E 


" : i 
gression matrix of the p set on the q set. Also, let us write 

Sia(p xP) = Su- Sy S 81e 

(4:5) of Roy (19544) as 


Seu; 
Stting, 42 = 0,[(1—0,) we can now rewrite 
(813) Cmax (932) 


R all q(B — 5) (B -2'1 <% 6m 

ae à confidence coefficient >1—# We recall the Lemmas C and E| of Roy (1954a), 

*. that (i) the statement ‘g, < alll «( ‘M) <go(for a Pp XP real matrix M with real roots)’ is 
26-2 


(3-1) 
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i nit 
equivalent to the statement 'g,«d'(1x p) M. (pxp)d(px1)«xg, (for all re de 
vectors d)’; and that (ii) the statement 'x'(1 x q) x(q x1)«A( a 0) is a oo 
statement ‘(x’(1 x q)d(q x1) < Ah (forallarbitrary unit vectors d)'. Using these 
we obtain from (3-1) the equivalent set of confidence statements, 


y— 3:2) 
di Bd, - Adus(,5) chus (52) < dj Bd, < di Bd, Ach S.) has, — ( 


for all unit vectors d,(p x 1) and d,(g x 1), with a confidence coefficient > 1 —a. d E 
to the above lemmas again we notice that, with respect to variation over di and E um 
maximum values of d; Bd, and d; fd, are respectively cbax(BB') and chash’). N S "e 
choosing d, and d, so as to maximize d; Bd, and then choosing d, and d, so as to maxim 
d fd, and arguing in the same way as in th 


he previous sections we note that (3-2) implies 
i Y 4 1—1 3:3) 
Chas (BB') xi Neha (8,5) chax(Sz) < hax (Af) <S ChaB Bi a) T Ach axl S12) Cinax (992 )s ( 


in ) ow 
which, therefore, is a confidence statement with a confidence coefficient >1—a. Wen 
rewrite (4-4) of Roy (1954a) in the equivalent form 


a B-A SB- a — , (3-4) 
a'S,,a SP 

for all non-null a(p x 1)’s, which is a, confidence 

This means that (3:4), with a probability 1 — 

in the previous sections, take a( 

B® as the ‘truncated’ matrice: 


observe that (3-4) also implies 


e statement with a confidence coefficient 1 F 
æ, implies (3-3), with a probability 21 ps ad 
p x 1) such that the ith component is zero, define S(), B® a 5 
s obtained by cutting out the ith variate of the p set, &! 


Cs (B BO) -= Ach (S) hax (Sx) S Cia (0/0) (3:9) 
SChnax( BB) + Ach, (800) cb. (S71) 

Likewise, as in the previous sections, we observe that (3-4) also implies 
Chas (BERBE) — rel, (st) nas 8") < chy (fe) pE) 9.6) 
< Chex BOOB) + Ah (SGPS — ( 


max 


and so on. We have thus, with a join 


t confidence coefficient 
P statements like (3:5), d 


3 E 

2.1—a, the statement (3 ) 
be 

Statements like (3:6) and so on. This kind of result can 

" ; 

generalized by truncating the variates of the q set as well, but this will not be discussed he! 


4. CONFIDENCE BOUNDS ON ROOTS CONNECTED WITH 
MULTIVARIATE LINEAR HYPOTHESIS ON MEANS 


ris 
) and the sample mean ign 
ixis S. Setting 42 = 72/n + 1, where 72 i8 


we 
£'s T? distribution with P and n+1—p degrees of freedom." 
4:1-4) of Roy & Bose (19535) as 


y(a’Sa)t (e 
(a^a 
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(a’Sa)}/(a’a)}, with respect to variation over a's, are respectively (x’x)!, (EE)? and cb,.(S), 
then we reason in the same way as in the previous sections and deduce that (4-1) implies 

[RR]? —Achac(S) < [5'E]? < [R'E] + Abas (8), (42) 
which is thus a confidence statement with a confidence coefficient >1—a. Arguing as in 
previous sections and using the same notation as before for ‘truncated’ X, 5 and S obtained 
by cutting out the ith variate, the ith and jth variates (i +j), and so on, we have with a joint 
confidence coefficient > 1 —«, in addition to (4-2), p statements like 


[RORO] — Acba (89) < [EEG] < [KORO] + Ach ax(S), (4:3) 


p 
9 J Statements like 
[xc xe nt — Acl, (S6) < [E6987] < [EMR DTS + Ach, (96:2), 


— 1) variates, i.e. retaining any one variate. 


(4:4) 


and so on down to the stage of cutting out any (p 


4b. Some observations on multivariate linear hypothesis on means. Confidence bounds 
connected with univariate and multivariate linear hypothesis on means are discussed 
respectively in Chapters 15 and 16 of Roy (1954b). In this section we shall first set up 
a Physically more general hypothesis and then discuss the associated confidence bounds. 

; Let X(n x p) consist of n row vectors x{(1 x p) (with? = 1,2,..., n) which are independently 
distributed, such thatx; is N[E(x;), X] and let E(X) (n x p) = A(n x m) £(m x p), where m « 
and rank (4) = r& m. Let A(n xr) be a basis of A and let us write (as we can, without any 
loss of generality) A(n x m) = [Ay A, ]n and let us rewrite the expectation condition as 


y m-—? 
E(X) =n, Az Jar : (4:5) 
p r m-r |&|m-r 
p 


Here the X is a set of (observable) stochastic variates, £ is a set of unknown population 
Parameters, A is a known matrix of constants given by the design of the experiment and is 
called the design matrix. It might consist of numbers like, say 0, 1, eto. and/or a set of 
observed (non-stochastic) quantities, as in the case of regression problems with concomitant 
Variates. The population dispersion matrix Dis also unknown. This is the model under which 


Wi " 
? Propose to test the hypothesis 


Hy: s[n Cm I gir xp) jro xu) =0, (4:6) 
q-s Fu Oa | Lé(m-—r x p) 
r m-f i 
) are matrices given by the hypothesis to 


wi 
here C(g x m), partitioned as above, and M(p x% he right side of (4:6) stands for 
vi tested and are called the hypothesis matrices. The 0 on the A Vice rank (M) = u< p and 
X 1* u matrix whose elements are all equal to zero. It is assume ite ona si 
aus (C) 2 s& r(&m «nof course), and furthermore tha 12 


C ang 


t, row-wise, [Cu 


of C. We recall that for H, to be *testable' (i.e. the 


čare such that there exist unbiased 
for each of them) we should have 


(£7) 


» columnwise, Cu is also a basis 


Se a 21. x 
2t bilinear functions CEM of the unknown par pnis 
"mates. which are bilinear functions of the observations, 


Ciz = (Aj Ay) ti 40 (with i = 1,2); 
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n = . = such 
However, in most realistic problems, the C matrix of the hypothesis is Senan S se by 
thatthe list rows are absent and we can, therefore, without any essential loss of gene 

replace (4-6) by 


I C Jf &(rxp) 20 (4:8) 
a Ea Se $ A M(p xu) 5 ; 
4 
and (4-7) by C, = C(414,)241 Ay, ( 


ists of 
We now go back to X and observe that X(nxp) M(pxwu)[— X*(n x), say] peret 
n rows of independently distributed vectors Xi(1 x p) M(p x u) [2 x? (1 xu), say] su 
xf is NLE (x7), M'XM], i.e. N[E(x£), X*], say, and that 


* 4-10) 
E(X*)-[A, Aj H M-[A, Aj [Bl | 
say, where 
v [£, _ Ë A (£11) 
m-ra nsl] 
p u 
The H, of (4-8) can now be rewritten as 
Hs fC, O] ah =O) (4:12) 
r m—rv [El m—r 
u 
and the alternative H to H, can be expressed as 
H: gC, 0,] A r = 9*(8xu). (4:13) 
T m-—r |EX|m—r 


aL 
We next recall (16-63) of Roy (19545), viz., 


a’X'A,(4; A,)101, 0b — (a/Saj [5c,(p, 5, n — rJ] <a'y'Ob 
S9 X 44; 4))710,, Ob + (a'Sa)! [sc (p, s, n —7)]E, 
for all non-null a(p x 1)'sand all uni 


w ', 6 
t vectors b(s x 1), and substitute M'X',ie. X* for X 
for C, a*(u x 1) for a(p x 1) 


tement 
, u for p and 7* for 7. We then see that the confidence sta 
(4-14) is replaced by 


(414) 


a" X* A41 4,)301 Ob — (a*S*asy [se 


(Us $,7, -r)r 
<a*4* Db a* X AA1A,)- 


.15) 
10; Üb + (a*’S*a*)} [se (us, m—r)] (4 


&]|» 
for all non-null a*(u x 1) and all unit vectors b(s x 1), where 7* is given by (4-13), É 1 
(4:11), X* stands for XM, and where 


F .16 
i ÜÜ = [C(A1 4,303 go 
an 


(=) uxu) = M'(u x p) X'(p xn) 0) As(41 44) 4 (n x p) M(p xu): (4-17) 

et 
We note that (4-15) is a set of confidence Statements, with a joint confidence coeffic* 
1—a,on bilinear compounds of 7* 


ing 
» where 7*, defined in (4-1 3), may be regarded as measu™ 
the deviation from the null hypothesis ih 


S. N. Roy AND R. GNANADESIKAN 405 


i 4c. Further consequences of (4:15). Starting from (4-15) and arguing in the same manner as 
in $3 and setting c,(w,s,n—7) = c, (say), we note that (4-15) implies 
4 A [4 ^ + ^, d 
cba LX* 4,041 4,)7101 000,4; Ay) 14: X*] - [sc] chax(S*) < cas Dr" (UU) y*] 
< ob 4,014900 0C(A14;)14; X*]- [se] eas (9), (4:18) 


or substituting for ÜÜ' from (4:16), 


(sS) — [sa]! chax(S*) < eas Dr [C (41 43) 7 JN] das 8*9) + [c] hax(S*), 
(4:19) 


chax 
where the matrix due to the hypothesis, i.e. sS** is given by 

sg** — M'X' A414) C10(414)3 C17? OA Ay) AX, 

)S* is given by (417). Notice that (4-19) is a confi- 

a and that the middle term of (4:19) is 


(4-20) 


and the matrix due to the error, i.e. (n —7 
dence statement with a confidence coefficient 7 1- 
Zero if, and only if, the hypothesis H, is true. For p = 1, M(px u) will drop out (except for 
is trivial scalar factor, since u < p) and we shall have the univariate problem where c; (55) 
Will be replaced by just the sum of squares due to the hypothesis, Cmax(S*) by just the error 


mean sum of squares and 


[9 * [0 (414) Oo] .*] by just the scalar €" (0141 4,) C1] ^ w*. 


the same way as in $83 and 4a we see that 
ements like (4:19) involving ‘truncated’ 


h variate, (3) statements like (4:19) 


Cmax 


Starting from (4:15) and reasoning in exactly 
(415) also implies, in addition to (4:19), w stat 


SHO, S and qj? obtained by cutting out any it 
involving ‘truncated’? S*(62, S**6 D and ge obtained by cutting out any pair of ith and 
Jth variates (with 22-7), ends on. These latter confidence statements will thus have a joint 
Confidence coefficient > 1—«. 
TN be noted that the probl 
in $4c; nevertheless, for expository pu 
Problem like the one in $4 and then take up the most gene 


em discussed in $4a is really a special case of the one dis- 
rposes, it is worthwhile to discuss first a simple 


ral one in § 4c. 


5. EXAMPLES TO ILLUSTRATE THE TECHNIQUES OF 883 AND 4 
king, the method developed in $4c for the 


5-1. Toi 

: To illustrate $4 or, more properly spe 

8enera] problem d multivariate analysis of variance of means, We use the data from à 
Umerical example dis cussed in à standard textbook (Rao. 1952). The data consist of three 


ds of physical measurements on 140 school children, in wm less the pert 2s AU 
Om mos ; -up 0: s paper, le x 1), for 
i ™ six different high schools. To translate into the set-up pap 1 


= 1,2,.., 6, denote the population mean vectors and let 


Hi 
£0x3)7|i]|." 
230 


3 
— 140, be the observation matrix, we can 


al sample size = 
(5-1-1) 


T f 
Fa if X(n x 3), where n = tot 
: ED) = A(n x 09) 
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where A(nx6)=Pl 0 00... 07, mt... +25 = 140 
L0 0; . . . @ 
9-059 € . —.. 7 
0 0-0 GO . . . j 


and r(A) = 6. 
Also Ho: = ... = &(=)CE = 0(5 x 3), where 


0 .. 0 —I 
l 


1 
C(5x6) = |? iba: 


070 s. « L 2j | 
wE EN 
so thatr(C) = 5. Furthermore H: CE = (5 x 3). Hence we havep = 3,n = 140,r m idit 
Next we take over from Rao's book the "between" product moment and ‘within’ pro 


moment matrices given respectively by 
752-0 214-2 521-3 
sS**(3 x 3) = 151-3 401-2 
1612-7 
12809-3 1003-7 2671-2 
1499-6 4123-6], f 
21009-6 


(n—r) S*(3 x 3) = 


According to the set-up of this paper we ar 
or confidence coefficient > 0:95, say, 
function which is a measure of depart 
functions which are measures of de 


: ^ ility 
e interested in obtaining, with a joint probabil 


c 
à ^ netr! 
simultaneous confidence bounds on (i) a param®?’ 


trie 
ure from H, (on all three variates), (ii) three parame 


m ree 
parture from H, (on any two variates), and (iii) th 
parametric functions which are measures of depart 


z ing 
ure from H, (on any one variate). Got 
back to the middle term of the inequality (4-1 9), 


he 
weobserve that here both A and C are es 1 
full permissible rank, and caleulate out [C(A' 4 )!C"]- and obtain the following parame 


functions (on which we shall put confidence bounds): 
For (i) SPD EE Ow- = 0 tay), 
where Ty PA "ly (k = 1,2,3) and [> 300 = N) Mjr -»àn| 
(5, k' = 1,2,3) stands fo 


rà matrix whose k, k’ 
for (ii) 


. Lewis 
element is the expression within [  ]. Like 
6 
da E nin — De) (5. — Me)! d 
= 09 (say) when k,k' = 2, S 


= OO (say) when k, k = 1,3, 
= 09 (say) when k, k’ = 1,2. 


and 
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Finally, for (iii) 1 z 
Cmax [= nau. — nh] 


= 9&9 (say) when k= 1, 
= 0%3 (say) when k = 2, 


and = 042) (say) when k = 3. 


Notice that for (i) we have to deal with a 3 x 3 matrix, for (ii), three 2 x 2 matrices, and for 
]=[ itself. Notice also that where the null hypo- 


(iii), three scalars for which chal 
theses are not true, then on the six groups, © is the positive square root of the characteristic 


root of the population ‘between’ product moment matrix for all the three variates, next 
OW, Q9 and @® are respectively the positive square roots of the characteristic roots of the 
population ‘between’ product moment matrices for the variates (2, 3), (1,3) and (1, 2), and 
finally 99,2, 90.9 and 0»? are respectively the positive square roots of the population 


Table 1-1 
(1) (2) (3) (4) 
(0) : 
(a) (b) (a) (b) (a) (b) (a) (b) 
k n JA. l 
Matrix M sS** S* sS seo sS**o sro sg**o | S*o 
cha (M) 44-40 12.93 | 4142 1277 | 4331 1258 | 28:65 | 981 
(5) (6) (7) 
(0) 
(a) (0) (@) 0) (a) © | 
Matrix M sga | S* a2) ggweom| S*m» s**a | See» 
Chax(M) 40-16 m 12-30 335 | 2742 9-78 
| 


int the data and the analysis 


s 1, 2and 3. Up to this po 
d similar ones obtained by 


fidence bounds (4-19) an 


‘ 
b : 
etween ’ sum of squares for the variate: 
in other words, the 5 % points 


a A 
Te entirely realistic. To set up the con: 


trune H 

ating the variates by ones, next by twos, We need Ca, OT; > 

9f the relevant statistic with D.F. p = 3, 8 = 5 and n-' = 134. The construction of such 
re not available to us at the moment. From 


oints à 
Roy, 1953) we set very appro 


145 and [sc] = 3:81. 


ta) les i 
is under ^ 
Certain; nder way and the per centage p mee 
n inequality relations discussed in ( 

so that Scal )= 
bounds on @, the 2nd, 3rd and 4th on ©! 
h, and 7th on Q42, 90:9 and O29, respec- 
ve lower bounds would merely 


Calp, 8, —r) = coo (3, 5, 134) = 2:90, 


onfidence 
he Sth, 66 
n-negative, negati 


(1) 
d 


ae Table 1-2, the 1st column gives e 
"m ©, respectively, and finally t 
Y. The G's being intrinsically no 
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imply that the corresponding O's could come down to zero values. The joint po 
confidence coefficient 20-95. As stated just now, the analysis and the — ii 
rendered somewhat approximate for lack of tables of percentage points, which, if available, 
would have made the confidence statements entirely correct. m- 
Tt may be asked at this stage, what happens if we use Table 1-2 for testing of hypo us 
We observe that the procedure on testing of hypothesis, while related to, is not pii 3 
identical with the procedure on confidence bounds. As diseussed in (Roy, 1953) thag " 
a procedure (which would be the right one) for testing H, with an exact preassigned € 
ability, and, at the 5% level, H, would be just rejected on that level, but would not 
rejected on a level, say, a little less than 5 %. This agrees broadly with the conclusion g!V 


Table 1-2 
Variates A 2nd 
None lst 2nd 3rd Ist lst pi 
cut out and 9d | exdged | and 3rd 
Bounds | 
ENSEM 
z |- - : 
Lower bounds | | 9-84 
[-(2)-[s,]()] | —486 | —7923 | 5-58 —8-73 —T54 — 0-46 T 
Upper bounds | 64-68 
[= (a) + [5,](5)] 93-66 90-07 91-80 66-03 87-86 25-06 


in (Rao, 1952). However, if we use the ab 

hypothesis at a level <5 % (and rec 

of the percentage point) 

correspond to the null hypothesis), we would say that we should just about not reject ) 

i E on all 3 variates and on variates (1, 2), (1, 3) and (2, 3) and also on variates 
and (3). 


sort 
ove Table 1-2 on confidence bounds, for wes 
all the approximation introduced by taking & rough Y 


; then, since all the confidence intervals include zero (whieh "s tbe 


5-2. To illustrate §3we construct an arti 


the 
3 ficial example by taking over S** and S * from 
previous example and putting 


E. l x3) 
P —q = 3, n = 8 (sample size — 1, B(3x3)B(3x 3) = S**(3 x 3), S} 9(3 x 3) = S*(3 
and I o 
S»(3x3)-|0 1 0} =1(3), (say). 
00. M 
Thus all ($3?) = 1, We have also, 


; = 0:67, 8° 
very approximately, 0, (5. g, n) = Oo.95(3, 3, 8) = 


ix 
that A? = 0-67/0-33 = 2.03. Denoti ae: 


of the p set on the q set, by a2 
population regression matrices o 
f9(1 x 3) and B21 x 3) 


ng by /(3 x 3) the unknown population regression 3 
x 3), BO(2 x 3) and B°(2 x 3), respectively, the d 3); 
f (2,3), (1,3) and (1,2) on the q set and by &™ Pa of 
; » respectively, the unknown population regression Ve"... 
variates (1), (2) and (3) on the q set, we are interested in obtaining, with a joint proba 2,9) 
z 0-95, cross bounds on dba. (A") = 9 (say), on cà,.( BOB) = G(say, with i = oe 
and on Chax( BOBO) = 6n (say with i+ j =1,2 3) js 
DN 2,3). gjs 
Each of these @’s is a Proper measure of departure from a corresponding null hypothe 
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We now have 


Table 2-1 
| 
(1) (2) | (3) (4) 
(0) | | | 
(a) (b) (a) © | @ (b) (a) (b) 
| | 
Matrix M BB' | Ss popgar SQ pope Se) | Bopge Se 
cb. (M) 19-86 12-93 18-52 12-77 19-28 12-78 12-81 9-81 
(5) (6) | (7) 
(0) 7 
(a) (b) (a) (b) (a) (b) 
Matrix M Bu Sq pa» Sa | pe» | se» 
x pue» x Bas’ xBew 
eh. (ln) 17-96 12-52 5-50 3-35 12-26 9-78 
Table 2-2 
Variates of the | Xone lst 2nd 3rd lst lst 2nd 
p set cut out and 2nd | and 8rd | and 3rd 
Bounds 
Lower [= (a) —A(b)] 1:50 0-39 i15 | -112 0-18 0-74 | —1-63 
Pper [= (a) +A(d)] 38-22 36-66 37-43 26-74 35-74 10-26 26:15 


ence bounds on O, the 2nd, 3rd and 4th, respec- 
on 92, 90:3 and 09, all 


jns CEN " 
T'urning now to the problem of testing of the null hypothesis we wem thatit pt to 
9 this at an exact preassigned level (Roy; 1953a). As observed on ; M go * i 
Rote also that the procedure of testing of hypothesis, while rela : dye = e i M 
identical with that on confidence bounds. However, if we do Fh ei ci iar : Fs "i 
Ypothesis, then we first recall all the approximations involve 
“max(M)>o, (truncated M). We have now, necessarily, 


,9 06d and 0920969, 969. 
0290 2) (3 Q9: Qo 2 9*9, 
00, 09, 09; 902099», 907; (5-2-1) 


‘max ( 
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We note next that the only intervals that include zero are the columns 4 and 7. Thus we 
should roughly conclude that (i) not all variates (1, 2, 3) are independent of the qset, (ii) none 
of the variates (3) or (2) is independent of the q set, while variate (1) is independent of the 
q set and (iii) not all variates of either (2, 3) or (1,3) or (1, 2) are independent of the q set. 
The conclusion (i) follows from the 1st column, (ii) from columns (5), (6) and (7), while (iii) 
follows from columns (2), (3) and (4) and also the inequalities (5:2-1) 


Concluding remarks. We hope to be able to illustrate much better and with realistic 
examples and tables the methods of §§ 1, 2,3 and 4 when two-tailed tables needed for 881 


and 2 and one-tailed tables needed for §§ 3 and 4 become available, as we expect them to be, 
in the near future. 


In conclusion, it is a great pleasure to thank the edi 


1 tor and the referees for their 
valuable suggestions for the improvement of the paper bot 


h in form and in content. 
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TABLES FOR USE IN ESTIMATING THE NORMAL DISTRIBUTION 
FUNCTION BY NORMIT ANALYSIS 


PART I. DESCRIPTION AND USE OF TABLES 
PART II. COMPARISON BETWEEN MINIMUM NORMIT X? ESTIMATE 
AND THE MAXIMUM LIKELIHOOD ESTIMATE 
Bv JOSEPH BERKSON 


Section of Biometry and. Medical Statistics, Mayo Clinic, Rochester, Minnesota 


Part I. DESCRIPTION AND USE OF TABLES 


"fo ans lative normal distribution funetion has been used extensively in bio-assay and in 
thes experiments in which P, the probability of some all-or-none ‘response’ is a monotonic 
ae ction, incre asing or decreasing, of a quantity x which measures the potency of the agent 
producing the response. T'he function may be written asin (1), with some auxiliary quantities 
defined by (2) and (3) 


- ^ 1 Xi hu? 
B= SO, = hé du, 
9n i 1—Q; roma! (1) 
Z,= l qni, (2) 
e (27) ^ 
Normit of P; = X; = (r;—/0[0 = a fa. (3) 


B. the Probability of response at x; is given by (1) in terms of the integral of the standardized 
normal function W(0, 1). The normal frequency curve, N (x4, v) with ordinate Z, given by (2) is 
Sometimeg taken to represent the distribution of hypothetical resistances of the individuals 
^» i he experimental population.* From (3),@ = —//e; f = 1/o. In bio-assąy problems, one 
. "'équently interested in the value of the dosage x corresponding to a 50 % response; this is 
Blven by Tyo = —alf.-t 
in Assuming this model, and given a set of observations Tis Misis where n; is the number of 
Widuals ‘exposed’ at x; and r; is the number responding out of the n;, a method of esti- 
os the parameters, hihi is part of a procedure named ‘normit analysis’, has been 
estigated by Berkson (1955a). The estimator provided by this method falls in the class 
ird of Neyman (1949) (regular best asymptotically normal) and therefore is asymp- 
ically equivalent to the maximum likelihood estimator. For finite samples, at least in 
os © variety of conditions corresponding to situations met in practice, the variance and 
mi Square error of this estimator are smaller than those of the maximum likelihood 
mator, | 
ved relative frequency of response at »;, and 


ith D;-—] A 
= 1~g, =r,/n, representing the obser à i 2 
e MD : bserved normit or normit of p;), if (1) gives 


Aire : i 3 
Presenting the normit corresponding to p; (0 
del. There are cases known in which such variable 


ity well describe the observed phenomena. The 
tes the relevant probabilities, without implying 


* 
tesigi, however, is not a necessary part of the mo 
Model er May not even exist and still (1) may pre 
S: de sis simply in the hypothesis that (1) evalua’ 
t Not Y any particular mechanism underlying them. 
9 be confused with x; (i = 50). i See Part 1. 
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the true probability of response at Xi the observed X; plotted against x; should fall along the 
straight line X, = a+ f, subject to random variation of the X;. The estimate of normit 
analysis is obtained by minimizing 


normit analysis 


X (normit) = En,w,(X,—X,)2, (4) 
where Xx =ĝ+ Be, is the estimate of a+ fax, and v; = Zi/(p;q;) with Z; given by (2) with p; 
replacing P.. Since (4) is asymptotically distributed as x°, itis called the ‘normit x”, and the 
estimate of normit analysis is called the “minimum normit x? estimate", 
The minimization of (4) is obtained by accomplishing a weighted least squares fit of 
a straight line to the observations Xj the weight of X; being niwi. The estimates of œ and p. 
symbolized 2 and f, are explicit functions of the observations Xi, X; and are achieved simply 
and directly without use of iterative procedures, from the following equations: 


p= Un,w,(X;—X) (v,—z) 


Zn w: X t- En,w,X,En,w,x,/En,w, (5) 
in,w;(x;—z)? = Yn;w;ai— (nwiz)? En, W; 2 
a= X — fis, (6) 
y = —2/f, (7) 
where X= in,w;,X;/In,w;, == Un,w;,x,/En,w,. 


For large samples the following formulas may be used to provide ‘internal estimates’ of 
the standard errors of these estimates 


(X) = 1/Zn,w,, 


(8) 
sf) = I/En;w;(v, —z)2, (9) 
58) = SX) zs (f, (10) 
1 - ^ 
S(s) = & [X + (8, — z)? ss (9). (11) 
When « only has to be estimated equation (6) gives this estimate on replacing f by its known 
value 2. 


The caleulations'require for e 


provided in Tables 1 and 9 (pp.414—19 below). For each valu 
Table 1 gives the required quantities, with 
unnecessary to compute p 
usually would be obtain 
calculate p, and to use Table 2 
P = 1, X is negativel 


ach observation the unit wei; 


were calculated using 
laces of those tables in 


* A generalization of this class of esti: r 
" F tes ha: ERES 2 
estimates’ (Berkson, 1954), a E epe bm formulated as ‘miniritim transform 
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Table 3. Example of calculations; data from Fisher & Yates (1953) 


s: | n r w* wX* | TAS 

3 8 0 0:25813 — 0:39601 0.77439 
4 8 0 | 0-25813 — 0-39601 1:03252 
5 8 2 0-53857 —0-36326 2-69285 
6 8 3 0-61350 — 0-19549 3-68100 
7 8 3 0-61350 — 0-19549 4-29450 
8 8 T | 0-38743 0-44569 3-09944 
9 8 | 8 | 0-25813 0-39601 2-32317 

| | 


Ew = 2-92739, Xwv-1789787, XwX = —0-70456, 


E XwX 
z= US 26011393, X= we = —0-24068, 

Xwa? = 117-76905 XwXs = —0-00013 

(Xwa)?/Xw = 109-42640 XwX Xwv|Xw = — 4-307063 

Xw(vr—z) —Diff.— 834205 Xw(X-—X)(r—z)- Diff.— 430750 


pS Ew(X-X)(v—2) _ 9.5163, &-X—ffz— —3-3973, 
Xw(r—z) 
Bo = — Jf = 6-5801. 


s(X) = -= = 0-02470 


(s(X) = 0-16), 


E d si (s(ĝ) = 0-12), 
sf) = mcns 01498 Ê. 


s*(&) = SX + 38%) = 0:6027 
tno) =L (E) + (êo 90) = 01724. (d) = 0-42), 


A 


X = 0-52x—3-40.f 


(s(&) = 0-78), 


ô 


E 
EJ 


ww 
aun 


Percentage 
o 
X, normit 


us 55558238 3 


A 


Ed 


7 8 9 


2 H * 
Fig. 1. The data of example for which caleulations B d in Table 3, 
together with the line fitted by minimum n x^ 
* 
From arein the present example, these columns contain nw 
and ny t Table 1. Ifthe n, are not all equal as they ERES ected, toa id oes Gann I. 


In P ee Tee i pi RU at the estimated normit line. A number of graph 


I Derg a ical work, it is useful to plot the data an 1 quies B e 

Veni ve been bl i i umulative normal distribution function. A H 
i publi ly the € f con 
lent lished for plotting linearly de, i ' : i a 


ni : 
Book œ 9e for practical work, in respect to size and 


‘ompany, Norwood. Mass. U.S.A. The present data are shown plotted on this graph paper in Fig. 1. 
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Table 1. Normit weights 


Upper figure in table is w = Z*[pg; lower figure is wX 


For r « 1n, wX is negative. For r> $n, use n—r as argument, and wX is positive 


4 2 3 4 5 6 i |l g 9 10 
T ae -———— J 2 — mm 
| 
0 — | 953857 | 0-44951 | 0-38743 | 0-34222 | 0-30763 | 0-28031 | 0-25813 | 0-23974 | 0.22394 
736326 | 443480 | -44569 | -43858 | -42551 | -41078 | '99601 | -38187 | -36834 
| 
1 = 63662 | 59490 | -53857 | -48987 | -44951 | -41588 “38743 | -36317 | .34222 
0 725630 | .36326 | -41228 | -43480 | .44390 44569 | -44332 | .43858 
| 
2 == — — -63662 -62192 -59490 -56612 -53857 -51309 -48987 
0 719756 | -25630 | -32042 | -36326 | .39240 41228 
3 = == a = = -63662 | -62917 | -61350 59490 | -57567 
| 0 11321 | -19549 | .25630 | .30188 
4 — Ex: :63662 | -63211 | -62192 
0 -08838 | .15756 
5 = = x = = | 063662 
© | 
i I 
n 4 | E - 
à 11 12 13 14 15 16 17 18 19 20 


0 | 0:21059 | 0-19881 | 0-18850 0-17916 | 0-17090 | 0-16345 0-156089 | 0-15093 0-14520 | 0-14014 
30592 | .34420 | .33335 :32302 “31348 | -30458 


729048 | .28890 | .98143 | -27486 
l | :82386 | -30763 | -29324 | -28031 | -26880 "25813 | -24841 | .23974 | «23138 -22394 
48248 | 42551 | -41823 | -41078 | .40342 | ‘agent "88875 | -38187 | .37487 | ‘sansa | « 
2 | :40870 | -44951 | -43186 | -41588 | -40099 :98743 | -37477 | .36317 | 35 i 
42582 | -43480 | -44062 | -44390 | -44547 | “dango 4488 | -44332 All bd 
3 | -55672 | -53857 | -52138 | -50513 | -48987 


“47555 “46214 *4495] 4375 

033668 | -36320 | -38385 | -39986 | 41228 | imis 42923 73 s 
£ | 100800 | -50490 | -58048 | -50612 | -55214 | -58857 

721245 | 25630 | -29162 | -32042 | 34380 


42638 
43480 43895 “44191 


152557 | -51309 | soory | .48987 

| 086320 | -37920 | -39940 | “toago 41228 

5 | 063360 | 0-62645 | 0-61697 | 0-60623 | 0-59490 058337 | 0-57182 | 0-56 . 
007242 | -13177 | -18103 | -22202 | -25630 Ed ees 


0-538577 
28514 | .3096] :33034 | -34805 | -36326 
6 — | 7636602 | -63446 | -62917 | .62199 "61350 | .60437 | . ‘ 
0 706132 | -11321 | -15756 119549 | .998]5 Sean ES ‘30188 
1 = | s — -63662 -63501 -63092 62521 
| 3 | 761843 -61094 -60306 
I | 0 05307 | .09925 | .13937 | 17451 | .20533 | -23237 
a kes -63536 -63211 -62751 -62192 
: | 04687 *08838 :12492 :15756 
| — — | — *63669 -63561 -63298 
| 0 -04193 -07954 
10 = | um | 


| | — 0-63662 
ks -— | | 0 
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Table 1 (cont.) 
Upper figure in table is w = Z?/pq; lower figure is wX 
For r< yn, wX is negative. For r>4n, use n—r as argument, and wX is positive 


T 
M á 21 22 23 24 25 26 27 28 29 30 


| Ww — - 
9 |0:13537 | 0-13091 | 0-12679 | 0-12301 | 0-11961 | 0-11615 | 0-11308 0-11042 | 0-10727 | 0-10499 
L4 '26815 | .26194 | -25609 | -25061 | -24564 | -24050 | -23587 | -23179 :22692 | -22335 
l | -21689 | -21059 | -20445 | -19881 | -19338 | -18850 | -18354 117916 | -17506 | -17090 
736190 | -35592 | -34990 | -34420 | -33855 | -33335 | -32791 | -32302 “31833 | -31348 
2 | .33208 | -32386 | -31564 | -30763 | -30029 | -29324 | -28673 "28031 | -27449 | -26880 
“43561 | -43243 | -42910 | -42551 | -42193 | -41821 | -41458 | -41078 "40714 | -40342 
' 3 -41588 | -40589 | -39633 | -38743 | -37894 | -37090 | -36317 "35579 | -34880 | -34222 
044390 | -44511 | 414507 | :44569 | -44525 | -44445 | -44332 | -44193 | -44032 -43858 


+ , 47907 -46870 | -45885 | 44048 | -43186 | -42361 | 41588 | -40822 | -40099 
“41969 | «42582 | -43098 "43804 | 44062 | -44251 | 14390 | -44488 | -44547 


D 5 ' M Me NEA 
P O raag ib 5 | 41228 E E E | “anes | e 

MARY | esha | seus “529 +52 51309 | -5 E à 
EIIIIIIIEII 
ED —] | mM ‘sive | -24088 | 90208 36326 | 3:272 ‘38125 | 
EEE E EIE EEEE 
BE. sm ‘T7086 ‘21775 | 28787 | “38030 | 20204 | “28208 | S7 


10 | 0-635 ; -63041 | 0-62645 | 0-62192 | 0-61697 | 0-61173 | 0-60623 | 0-60062 | 0-59490 
D glo ES -13177 | -15756 | -18103 | -20236 | -22202 | .23989 | .25630 


uu Y. -63593 | -63409 | -63137 | -62797 | -62403 | -61973 | -61509 | -61026 

03e? | osasi | 00640 | 09532 | -12181 | -14017 | “tees "18903 | .20786 

12 | — | -63662 | -63604 | -03446 | -63211 | -62917 | -62573 | -62192 

| = end 0 :03190 | -06132 | -08838 | -11321 | 13627 | 15736 

UN | :63662 | .63612 | .63476 | -63272 | -63012 

= | 0 “02951-05688 | -08223 | -10585 

T | = — :63662 | -63619 | -63501 | 

| 0 :02744 | .05307 

15 0-636062 
= E" | 0 
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Table 1 (cont.) 


Upper figure in table is w = Z?[pg; lower figure is wX 
For r<4n, wX is negative. For r> łn, use n—r as argument, and wX is positive 


NI 31 32 33 34 35 36 37 | 38 39 40 
" 
O | 0-10223 | 0-09990 | 0-09802 | 0-093564 | 0-09371 | 0-09177 | 0-08981 | 0-08833 | 0-08634 | 0-08483 
:21897 | -21523 | -21219 | -20830 | -20514 | -20191 | -19862 | -19612 | -19272 | -19013 
1 :16737 :16381 :16019 | -15689 +15393 15093 “14789 | -14520 :14249 :14014 
-30931 | -30501 | -30058 | -29648 | -29274 | -28890 | -28496 | :28143 | -27782 | -27466 
2 26326 | -25813 | -25318 | -24841 :24384 | -23974 | .23559 | -23138 | -22768 | -22394 
:39964 | -39601 | -39237 | -38875 | -38517 38187 | -37844 | -37487 | -37166 | -36834 
3 :33589 | -32984 | -32386 | -31820 | -31285 30763 | -30276 | -29781 -29324 | -28884 
43606 | 43463 | -43243 | -43018 | -42789 | -42551 | -42317 | 742060 | -41823 | -41579 
4 39405 | 38743 | 38100 | -37477 | -36894 36317 | -35765 | -35241 -34726 :34222 
744572 | -44569 | -44540 | -44488 | -44419 | -44332 | .44231 44119 | -43994 | -43857 
a | 
5 | 0-44226 | 0-43535 | 0-42854 | 0-42214 | 0-41588 | 0-40977 | 0-40384 0-39827 | 0-39274 | 0-38743 
43745 | 43961 | -44141 | -44280 | -44390 | -44472 44528 | -44560 | -44574 | -44569 
6 | 48254 | -47555 | -46870 | -46214 | -45505 | -44951 44348 | -43758 | -43186 | -42638 
"41743 | -42188 | -42582 | -42923 | -43225 | -43480 43703 | 43895 | -44062 | -44191 
7 | -51660 | -50970 | -50286 | -49631 | -48987 | -48357 "47700 | -47161 | 4 5 
: 5 i » ) 46587 | -46025 
38887 | -39506 | -40186 | -40733 | -41298 | -41674 42005 | -42420 | .42733 | .43014 
8 | 754531 | -53857 | -53198 | -52557 | -51929 | -51309 “50708 | -50119 
à : 3 : S 49544 | -48987 
35402 | 36326 | -37164 | -37920 | -38608 | .39240 739810 | -40330 | -40802 | .41998 
9 | -56928 | -56301 | -55672 | -55058 | -54449 | -53857 :53268 | -52694 
i 3 ; FOODS ae g; -52138 | -51583 
31453 | :32597 | -33663 | -34627 | -35518 | -36326 “37077 | -37762 | -38385 “38966 
I 
10 | 0.58917 | 0.58337 | 0-57757 | 0-57182 | 0-56612 | 0-56049 0-55490 | 0-5 «b 
:27128 | -28514 | -29791 | -30961 | -32042 | -33034 “33955 | grs EU Enn 
11 | -60522 | -60014 | -59490 | -58966 | -58443 | .57919 “BT. +5687 2 s 
“22538 | -24133 | 25630 | -27003 | -28269 | -29444 “308% IND d pone 
12 | -61782 | -61350 | -60899 | -60437 | -59971 | .59490 | -5 s 
"17:24 | 19549 | -21245 | -22815 | -24262 | -25630 | 208m ‘28064 | 28048 | -57567 
:29162 | -30188 
13 | +62711 | -62370 | -62005 | -61620 | -61212 60795 | -60370 | .59939 
"1277 | -14802 | -16690 | -18436 | -20087 | -21614 :23034 | .94: ee | aga 
4377 | -25630 | -26794 
14 | -63321 | -63092 | -62821 | -62521 | -62192 61843 | -61476 | -6109 
"07701 | -09925 | -12009 | -13937 | -15756 | -17451 :19038 “20588 eis pond 
15 | 0.63624 | 0-63520 | 0-63360 | 063157 | 0-62917 | 0-6 i 
"02568 | -04973 | -07242 | -09343 | -11321 | 1312) bi yu $39 | 98195 
x Ə . Ts 
16 — 63662 | .63628 | -63536 | -63394 | -63211 :62994 | 6275 ine INC 
0 “02425 | -04687 | -06830 | -osese | “1o7ag | “02751 Nite, iu 
d ES = 5 š * 
1 -— 02003. 89082 | 3ESBGO.| satoa | anos | . 
0 -02281 04439 9 63062 | -62839 
- E 06450 | -08365 | .10176 | -11884 
>= x = zz = 662 | -63635 | -6356 
l| -63446 | -63298 
" t. z 0 02154 | 704193 | -06132 | .07954 
e — :63662 | -63638 | -63571 
0 :02042 | -03986 
20 — = = = = = 
= S — 0-63662 
0 
! 
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Table 1 (cont.) 


Upper figure in table is w = Z?[pg; lower figure is wX 
For r< 4n, wX is negative. For r>4n, use n—r as argument, and wX is positive 


417 


n 
E 41 
9 | 0-08330 | 
18750 
l| -13776 
“27144 
2 | -22044 
-36517 
3 | -28460 
41335 
4 | .33749 
43716 
5 | 0-38237 
44548 
6 | .42096 
44303 
7 | «45474 
43264 
8 | -48437 
41619 
9 | 51041 
:39499 
10 | 0.53330 
:37001 
ll "55337 
34196 
12 | .57089 
31143 
13 | .58606 
“27886 
14 | «59903 
:24463 
15 | 0.60994 
:20905 
16 | -61885 
17954 
17 | .g9593 
‘13503 
18 | -gsii9 
i "09689 
9 | 63467 
-05831 
20 0:63640 
"01946 
21 
22 
23 
24 
25 
CN 


44 


0:07919 


r1 
pc 
? 
D 


-44402 


-40589 
44511 
-43924 
43844 
| 46870 
-42582 
-49480 
40853 


; 0:51805 
-38738 
-53857 
-36326 
-55672 
-33663 
-57275 
-30779 
-58669 
-27735 


| 0:59873 
24549 


-60899 
-21245 
-61759 
-17831 
-62450 
-14355 


-62982 
-10820 


0-63360 
:07242 
-63587 
03620 


-63662 
0 


18029 | 


0-36769 | 


42 | 43 
0-08177 | 0-08022 
18483 | -18212 
-13537 | 
-26815 
-21689 5 
-36190 | -35881 
-28031 27644 
-41078 | -40838 
-33268 | -32820 
«43561 | 43405 
0-37721 | 0-37249 
44511 | 44464 
“41588 -41070 
-44390 | -44461 
44951 | -44429 
"43480 | 43075 
“47907 | 47376 
“41969 | -42294 
50513 | -19993 
39986 | «10437 
0-52307 
-38200 
-54342 
-35668 
-56612 | -56143 
-32042 | -32873 
-58157 -57713 
-28920 | -29884 
-59490 -59082 
-25630 | -26711 
0-60623 | 0-60250 
-22202 | -23411 
-61570 | -61239 
18647 | -19982 
-62331 | -62049 
15018 | -16476 
-62917 | -62692 
*11321 +12882 
:03332 | -63169 
“07574 -09232 
0-63580 | 0-63485 
-03795 | -05561 
“63662 | -63642 
0 -01851 


4b | a6 | 47 | 48 49 50 
| | 
| 0-07762 | 57  0-07498 | 0-07391 | 0-07283 | 0-07175 
| cl7701 | -17563 | -17277 | -17084 | -16889 | -16692 
-12886 | -12679 -12512 | -12301 | -12132 | -11961 
-25904 | -25609 | -25369 | -25064 | -24816 -24564 
-20723 | -20445 20164 | -19881 :19595 :19338 
-35265 | -34990 | -34709 | .34420 | -34124 | .33855 
-26880 | -26503 26148 -25813 125475 :25160 
-40342 | -40087 | -39839 | -39601 | -39354 | -39118 
-31968 | -31561 | -31155 | -30763 | -30387 | -30029 
-43078 | -42910 | -42731 | -42551 | -42371 | -42193 
0-36317 | 0-35876 | 0-35448 | 0-35032 | 0-34611 | 0-34999 
-44332 | -44252 | -44165 | -44070 | -43964 | -43857 
-40099 | -39633 | -39192 | -38743 | -38305 | -37894 
44547 | -44567 | -44574 | -44569 | -44551 | -44525 
-43437 -42955 42478 42022 *41588 "41147 
-43989 44116 44226 44317 *44390 *44451 
-46376 | 45885 | -45409 | -44951 | -44497 | -44048 
-42842 | -43080 | -43292 | -43480 | -43651 | -43804 
-48987 | -48505 | -48023 | -47555 | -47100 | -46649 
41228 | «41572 | 41895 | -42188 | -42454 | -42701 
| 0.51309 | 0-50829 | 0-50359 | 0-49887 | 0-49436 | 0-48987 
-39240 | -39698 | -40123 | -40525 | -40887 | -41228 
-53374 | -52903 | -52437 | -51996 | -51534 | -51091 
«36946 | -37519 | -38055 | -38554 | -39016 | -39452 
-55214 | -54757 | -54301 | -53857 | -53417 | -52984 
-34389 | -35076 | -35725 | -36326 | -36892 | -37422 
-50819 :56394 -55961 -55528 -55105 -54684 
-31626 -32434 -33183 "33894 "34556 -35181 
.58254 | -57838 | 57431 ‘56612 | -56208 
128704 | -29617 | -30465 “32042 | -32760 
| 5 T 5 
0-59490 | 0-59109 | 0-58720 0-57950 | 0-57507 
.95630 | -26641 | -27011 :29377 | -30188 
.g0558 | -60204 | -59849 -59131 | -58771 
19945] | -23556 | -24620 “26585 | -27487 
.81454 | -61142 | -60820 | -60496 | -60162 | -59829 
.19128 | -20355 | -21825 | -22626 | -23686 | -24677 
| 
.62192 | -61921 | -61641 | -61350 | -61050 | -60748 
715756 | -17086 | -18345 | -19549 | -20697 | -21775 
. .62552 -62317 -62064 -61806 -61534 
E .13751 | -15095 | -16399 | -17618 | -18798 
j 0-63041 | 0-62850 | 0-62645 | 0-62425 | 0-62192 
En .10349 | -11806 | -13177 | -14494 | -15756 
Bs .63386 | -63249 | -63092 | -62917 | .62725 
(53007 | «06925 | -08460 | -09925 | -11321 | -12664 
: .63593 | 63514 | -63409 | -63283 | -63137 
Eg .03461 | -05084 | -06640 | -08112 | -09532 
.63662 | -63646 | -63599 | -63526 | -63429 
0 -01691 | -03317 | -04877 | -06370 
ud = "63662 | -63647 | -63604 
0 :01627 | -03190 
2 = - — — | 0-63662 
0 


27-2 


Table 2. Normit weights 


figure in table is w = Z*/pq; lower figure is wX E 
For p < "owe une is negative. For p > 0-50 on right, wX is positive 


Thousandths, for p on left | 
| | 
p a 
o 1 2 3 | 4 5 6 7 8 9 | 
| 
z 199 | 5 T " 7 | 0-05 0-06054 0-06624 | 007175 
, = -01135 | 0-02014 | 002799 | 0-03523 | 0-04203 | 0-04847 | 0-05463 0 tad ar 
"M T 03507 -05796 | -07690| -09343| -10825| -12177| -13424| -14584| -15670 E 
à : :09226 | -09707 | -10177 | -10636 | -11087| -11528| - 
-01 | 0-07175 | -07709 | -08228 | -08733 | -09 AUT HOST) iH bs 
01 0092 | -17657| -18572| -19443 | -20272 “21064 | 21823 | -22550 sie eiie em. 
. “11961 | -12386 | -12803 | -13213 | -13617 | -14014 | -14404) -14789 | -1516 -155 ‘15910 
02 | (baseal 25187 | -25787 "26366 | -26925 | -27466  -27989  .28496 | -28987 | -29462 wet 
T T “17 ë 5 " T Bi 
E -15910 | -16273 | -16631| -16984| -17333| -17678| -18018| -18354| -18686 19014 38 
03 | "39923 | -30370 -30804| -31225| -31633| -32031| -32416) .32791 333156 | -33511| -33855 
7 E -19659 | -19976 | -20289| -20600| -20906| -21210! .21510| -21808  .22102| -22394 
0^ | 33805 | -34191 34517 734835 | :35144 | -35445 | -35738 | -36023| -36300| -30571| -36834 
0-05 | 022391 0.22682. 0-22968 | 023251 | 0-23531 | 0-23809 | 0-24084 | 0-24357 | 0-246927 | 024895 | 0-251060 
"36834 | -37091 | -37340| -37584) -37821| -38051, -38276 | -38495| -38708 | 38916 | -39118 
"06 | :25160 | -25423 | -25684| -25942| -26199| -26453| -26705| -26055| -27203| -27449| .27093 
39118 | -39315 | -39507 | -39694| -39875| -40052| -40295| .40392 | .40555| 40714 | 10808 
"07 | :27693 | -27935  -28175 | -28413 | -28649| -28884| -29116 | -29347 | -29576| .29804| -30029 
"40868 | -41019| -41165) -41307| -41445| -41579| -41709 | 41836 | -41958| 42078 ^ 12193 
"08 | :30029 | -30253 | -30476 | -30697 | -30916 | -31134| -31350| -31504| -31777 31989 | -32199 
“42193 | -42306 | -42415| -42520  -42022| .49722 42817 | -49910| -43000 43087 | -43171 
"09 | :32199 | -32407 | -32014| -32820| -33024 | -33227  -33429| -33629| -33828 "34026 | .34222 
“$3171 | -43251| 443330 | -43405 | -13477| -43547 | -43614| 43679 | -43741 43800| -43857 | 
| | 
0-10 | 0.34222. 0-34417 | 0-34611 | 0-34803 | 0-34994 | 0-35184 | 0-35373 | 0-35560 0-35747 | 0-35932 | 0-360116 
743857] 43912| -43964| -44013| -44001| -44106| -44148| 44189 | 14927 44263 | -44297 
‘11 | :36116 | -36299| -36480 | -36001| -36840| -37019| -37196| -37372 :37547 | -37721| -37894 
44297 | -44329 | -44359| -44386 ^ -44412| -44436 "44457 | 44471 | 44495 | -44511| -44525 
12 | -37894 | -38066 | -38237| -38407 | -38576 | -38743| -38910| -39076 :39241| -39405 | -39568 
144525 | -44537 | -44548 | -44556 | -44563 | -44509| -44572| 44574 "44574 | -44572 | -44569 
‘18 | 39568 | :39730| -39891 | -40051| -40210 | -40369 | -40526| -40682 40838 | -40993 | -41147 
744569 | -44564 | 44558 | -44550 | -44540 | -44529| -44517| -44502 “44487 | -44470 | -44451 
714 | -41147) -41299 | -41452 | -41603| -41753| -41903| -42051 “42199 | -42346 | -42492 | .42638 
"44451 | 44431 | :44410 | -44387| -44363 | -44338 | -44311 | -44283 "44254 | -44223 | .44191 
“15 | 0-42688 | 0-42782 | 0-42926 043069 | 0:43211 | 0-43353 | 0-43493 | 0.43633 043772 | 043910 | 0-44048 
‘44191 | 44158  -44123| 444088 | -44051| -44012| -43973 43933 | -43891 | -43848 43804 
716 | 44048 | -44185 | -44321| -44456| -44591| 444725 | -44858 44990 | -45199| .45253| .45383 
743801 443759 | 443713 | -43665 | 443617  -43507 443516 49405 | 43412 “43358 43303 
"17 | 45383) 45513 | -45642| -45770| -45898| -46025| -46151 46276 | -4 “465 -46649 
043303 -43247| -43191 | -43133| -43074| -43014| -42953 -42892 “43899 42000 “40701 
‘18 | 46049 | :46772| -46894, -47016| -47136| -47257| -47376 “47495 |. : -47849 
"42701 | -42635| -49569) -42502| -42433 | -42364 42294 | 42224 432133 3086 142006 
719 | 47819 | 47965 | 48081 | -48196 ^ 48311 -48425 48539 | -486 48764 | . . 
42006 | -41932| -41857| -41781 | -41705| -41627| 41549 Ataro baie aie eed 
020 | 0-48987 | 0-49097 | 0-19207 | 0-49317 | 0-49425 | 0-49534 049641 | 0-49748 | 0-49855 | 0. i 
41228 | -41146 | -41063| -40980 "40895 | -40810 | -40725 “40638 po yon M35 
“21 | :50066| -50171| -50275 :50379 | -50482 | -50585| -50687 +507 E s ^ 
“40875  -40285| 40195] -40105 | -40013| -39921| -39859 | ‘30738 | 30890 gees | RI 
| 792 | -51091) -51190| .51989 | .51387 | -51485 | :51583| -51680| .51776| «51979 i 
mass | 29856 | 39259 | -39162 | -30005 | :38966| 35508 | anae | “51872 esT quos 
| 723 | :52062| :52157| .52250| .52344 | -52437 -52529 52621 | -52712 
| | . *528 E P 
| 36466 | :38364 | -38262| -38159 | -38055 | -37951| 37025 | -37141| 3.093 Nu ds 
24 | 52984) -53073| .53162| .53251| -53339 :53426 | -53513 | .53600| . à 
:97422 | -37315| .37207 | .37099 | .36990 | -36881 :36771| -36660 136549 "30488 30320 
= ——Ó 
9 8 7 6 5 4 3 2 1 0 
| 


Thousandths, for p on right 


| -98 


| 0-89 


Table 2 (cont.) 


Upper figure in table is w = Z?/pq; lower figure is wX 
For p < 0-50 on left, wX is negative. For p > 0-50 on right, wX is positive 


Thousandths, for p on right 


"Thousandths, for p on left 
p = 
| 
0 1 2 $2 | 4 | 5 7 8 9 
I-— | i € 
0-25 | | | | o5 5 | 
5 0-53942 | 0-54026 0-54110 | 0-54193 | 0-54276 | 0-54359 | 0-54441 | 0-54523 | 0-54604 | 0-54684 | 0-74 
| ‘36101 | -35987 | -35874| -35759 | -35645| -35529| -35414 | | -35181 
26 | 54845 | -54924) -55003| -55081| -53159| -55237| -55314 -55468 | -73| 
| 34946 -34829 | -34710 -34591 | -34472 | -34353 | -34233 -33991 
27 | -55694 | 69, -55843 | -59917, -35990 | -50063| -56136| -56208 | -72 
| -33626 | -33504  -33381  :33257| -33134 | -33010| -32885| -32760 
| | | | 
'38 | -56208 | -56280 | -56422 | -56563  -56632) -50702, -56771| -56839| -56907| -71 
| :32760 | -32635 -32384 :32003 | -31876 | -31748 -31620 | -31492 
'20 | .56907 | .56975 -57109 "57308 | -57373 | -57438 | -57503| -57507| -70 
31492 | .31363 -31104 | -30713 | -30583 | -30451| -30320| -30188 
0-57631 | 0-57694 0-57944 | 0-58005 | 0-58066 | 0-58127 | 0-58188 | 0-69 
-30056 | -29923 -29524| -29390| -29256 | -29122| -28987 | -28852 
I 
58248 «58425 | -58484| -58542| -58000| -58057| -58714| -58771| -68 
38717 -28310| -28173| -28037| :27900 | -27762| -27625 | -27487 | 
| -59103| -59158 . 59318 | -67 
| | -26515 | -26235 | -26095 
'83 | .59318 | «50478 | :59079 | 59730 | -59780 | -59829 | -66 
26095 | .25954 35672 | 26105, .24963  -24820  -24677 
"84 | .59899 | «5997 xu 24| -60072| -60119| -60166 -60260 | -60 “65 
59829 -59 :59976 | -60024| -60072 260 306 | -65 
24677 REPE 24248| -24104| -23960| -23816| -23071 -23382 | -23237 
0-35 | 0.66306 | T 531 | 0-60575 | 0-60619 | 0-60662 | 0-60706 | 0-607 
*00306 | 0-6035 -60397 | 0-604492 | 0-60487 | 0-603531 | 0-60575 2 706 60748 | 0-64 
deren Deeds | 29801 | 22685 | -22509 | -22363| -22216 | -22070| -21923| -21776 
OU) eael nns | .g0875| -60916 | «60957 | -60998/ -61038| -61078 | -61118 | -O1188| - 
748 | -6070 -60833 - 5| :60916 | :60957 3 z 58 63 
21776) oraal -50833 | GoT | .eilso| .21038{ -20800| -20741 20598 | 20444 | -20295 
387 | sgyyeasl z 31312 | -61350 | -61387| -61425 | -61462| -61498| -61534| -62 
158| - -01236 | -61274 | -61312 | -61350 7 f 5 62 
20205 | aaraa] 21236 | 01974 | 01313) 19549 | -19399| -19249 | -19098 “18948 | -18798 
‘38 | .6155 E ; : 31676 | -617 -61745 | -61779 | -61812| -61846 | -61879 | -6 
534| -61570| .6 3| -61641 | -61676 61711 1 | a 1389 187 1 
18708 | -18647 18496 -18345 | -18194| -18043| :17891| -17740| -17588| -17436 | -17284 
39 | .61g -62039 | -62070| -62101| :62132  .62162| -62192, .6 
879| .61912| «6 -61976 | -62008| -62039 | :6207 2 32 -62 2192, .60 
‘Voss | 1019 61044) -61976| 02003 | .10522| -16369| -16216| -16063| -1910| -15756 
%40 | 9 7 5 | 0-62392 | 0-62420 | 0-62447 | 0-6247. 5 
"62192 | 0-682221 | 0-62251 | 0-62280 | 062308 | 062337 | 062365 9 EE EE 2474 | 0-59 
-15756 peas pos 0-62280 | 15141] -14987| -14833| -14679| -14525 “14370 | -14216 
41 | g a -62628 | -62653| -62677| -62701| -02725| -58 
2474 | -625 -62526 | -62552| -62578| :62603, :6202 s x o 
14216 | (laon | 42526 09552] “3506 | -13441| -13986 | -13130| -12975| -12819 | -12664 
42 -62861| -62883| -62004| -62925 -62946) .57 
'02795 | .627 -62772 | .62794| 62817 | :02839 EU x 3 ee 
; 12664 ks e bes 719040 | -11884| -11728| -11572) -11415| -11259 | -11102 
43|. -63064| -63082| -63101| -63119| -63137| .56 
62946 | . a 6 -63026 | -63045 | :630 5 
: -11102 nae E E 10473 | -10318| -10161| -10004| -09846| -09689 | -09532 
"4|. -63237 | -63252| -63268| -63283, -63298| .55 
63137 | .63154| . s .63205 | -63221 | :63237 0897 3 22. 
09539 od pon CE. ‘08901 | -08744| -08586| -08428| -08270| -08112| -07954 
045 . 0-63405 | 0-63417 | 0-6 
0-63298 | o. S ; .63354 | 0-63307 | 0-63380 | 0-63393 5 417 | 0-63429 | 0-54 
07954 10:019 E oi icr .07163| -07005| -06846 | -06688| -06529| -06370 
"46 | 83499 -63441| -63452| -03463| -63473| -63484| -63494| -63503| :63513 | -63522| -03531 | -53 
«4x | 09370! .06212| .06053| -05894| 05736 | -05577| -05418 -05259 | -05 : 941| -04782 
7 | :63531| .63540| -e3548 -03556| -63564| -63571 HE E db] Qe | 7 
4g | 04782] .04623| .04464| -04305| -04146| -03986) :03827 
5| -63604 5 20| -63625| -63629| -63633| :63637 | -63041]| -63644| -63647| .51 
5 "63609 | -63615 | :63620 2 o3 | .02234| -02074| -01915| -01755| -01596 
a 03190 *03031| -02871 | -02712| -02552| -02393 on 
“63647 |. s| -63657| -63658 | -63660 | -63661| -63661) -63662| -63662/ -50 
E 63650| -63653| -63655 5 .00638 | -00479  -00319| -00160! -00000 
~ | 91596| -01436| .o1977| -01117| -00957| -00798 
ijina a 
9 8 7 6 a : I a . 
p 


420 Tables for estimating normal distribution function by normit analysis 


Part IL. COMPARISON BETWEEN THE MINIMUM NORMIT X? ESTIMATE 
AND THE MAXIMUM LIKELIHOOD ESTIMATE 


1. Description of the investigation 


The statement in Part I (p. 411), characterizing the minimum normit X? estimate as having 


smaller variance and mean square error than the maximum likelihood estimate, was made in 
a preliminary version of the present paper, on the basis of results obtained in a planned 
programme of sampling experiments (Berkson, 1955a). The editor of Biometrika, in his 
comments on the manuscript, conveyed the su 


not a sufficient support for so firm a statement 
was supported well enough. However, 
based, in addition to the specific sam 
gained in the course of investigations o 
that the editor's comment was sound, 


ggestion that perhaps these experiments were 
- Originally I had thought that the statement 
on reflexion, I realized that my own conviction was 
pling results cited, on cumulated auxiliary results 
f similar questions going over many years. I decided 
for it was not to be expected that 


an average reader, if 
would sympathetically seek out al 


ll this evidence. And 


concluding section. 


It was decided in the first place to calculate thi 
actual summation over the total sam 


; for which even good approxi 
questions concerned with the lower bounds 


similar reasons the distributions of the sam 

and variance for an auxiliary experiment, corresp: 

were caleulated using a stratified random sampl 
The situation dealt with simulates a bio- 


onding to the estimate of relative potency, 
e. 


The minimum normit x? estimates of 2 an 
Part I above, together with definiti 
is 10 for all values of j, 


d æ have been given in equations (5) and (6) of 
ons of p;, qi Xow. na, the number exposed to risk at 2;, 
The values of X, and Z, required for evaluation of (5) 


and (6) were obtained as for Table 1, 


As —— 
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and the estimates were written to four decimal places correct to + 1 in the last place. Each 
estimate was checked by inserting the values in the estimating equations 


in, w,(X;— £j =0, (12) 
In, w,a,(X,;—X,) = 0, (13) 


where X TE fh; Check of the estimate was performed by a different computer than the 
one who calculated its value. Because of symmetry among the samples, not all 1331 of them 
required independent computation. For estimate of both æ and 2, only about a quarter, and 
for estimate of æ alone, about half of the samples required independent calculation, the 
estimates for the uncomputed samples being obtainable from those computed, by appro- 
priate change of sign. 

Since for p; = 0 or p; = 1, the corresponding value of X; is infinite, some rule must be 
adopted to modify these observations, if a finite estimate is to be obtained with samples 
containing either of them. I have used the rule of substituting for an observation of 2; 0, 
a working value p; = 1/(2n;) and for p; = 1, a working value p; = 1 — 1/(2n;).* 

To obtain the maximum likelihood estimate an iterative procedure must be used. For the 
Present experiment two methods were employed. For the most part the procedure utilizing 
the linear transform as outlined by Fisher & Yates (1953) was used, and with this the tables 
of Finney & Stevens (1948) were utilized; secondarily the method set forth by Garwood 
(1941) was employed, for which convenient tables have been provided by Cornfield & 
Mantel (1950). The two methods are fairly comparable in respect to the amount of arith- 
Metical labour involved, and no regular difference was noted in regard to speed of conver- 
gence, but the linear transform method is more directly analogous to the common method of 
fitting a straight line by least squares, and therefore more familiar to the computers—this 
Was the chief reason for preferring it. The estimates as used for calculation of the statistics 
Were to be correct to better than +1 in the third decimal place and therefore were set down 


With four decimal places, the last being correct to +5. In order to attain estimates of the 


X 
desireq precision, the following procedure was used: with ĉ and p (provisional values of 
" 


the estimates) decided on, 0@ and of (the respective corrections to thene for improved esti- 
Mates) were computed using the Finney-Stevens tables. If either 0& or 0f was as large as 
0-01, another iteration was performed in the same way, and this process was continued until 
both 06 and ð B were less than 0:01. The argument of the Finney-Stevens tables is carried to 
two decimal places only and cannot conveniently be used beyond this stage of the iterations. 

Tom this point on, the computations were accomplished using the W.P.A. tables (Lowan, 


the origin of which I do not know. Another rule has been advanced 
dum (1933), and doubtless there are others besides these which have 
eset forth some considerations on the basis of which the 
t advance them as necessarily preferable to any other. 
e minimum normit X? estimate and the maximum 
he different methods used for dealing with the 


* This is an old ‘empirical’ rule, 
Y Reed ( 1936), still another by Gad 
ipe employed. Elsewhere I (Berkson, 1955 b) hav 
n rule may be considered reasonable, but I do no 
li enould be noted that the ponta penzen the : 

h : ially con VA : : 
ol Ric ees x 100: ^ me "ta maximum likelihood Wien the rule Doe the applica- 
tion, to the particular observations of zero or 100% response, of a mod e Er v be Applied 

9 all Observations (Fisher, 1954), whereas in minimum normit X ber hes or ; he o sec 

tons of zero and 100%, there is no modification of the dues aia a m poni value vm ed Y 

9 Observation itself. ‘And of course there is the difference in the weig 3 titi rea xs 9o 
SStimation the weight for unit observation is 4%/pg, where RU E weg meron To BAG igorre: 
Ponding ee E likelihood estimates, whereas in the minimum normit X estimation the unit 

» d to the observations directly. 


Weight is 2/pq, where the quantities represented correspon 
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1942), which are more detailed though less convenient than the Finney-Stevens tables. 
Iteration was continued until both 90 and df were less than 0-0005. The estimates obtained 
with these adjustments, written to four decimal places,* were checked by insertion in the 
estimating equations EU Nu 
X(n;Z;[P,Q;) (p; — P.) = 0, (14) 
E(n;z;Z;|P,Q;) (p; - P;) = 0, (15) 
where B, -1— Q, is the estimate of P., the probability at x;, given by the estimated para- 
meters, Z, is the corresponding normal ordinate and P: is the sample value of the relative 
frequency. As in the case with the minimum normit x° estimate, independent computation 
was necessary for only about half of the samples for the estimate of œ alone, and for about 
a quarter of the samples for the estimate of both parameters, 

The number of iterative cycles required depends of course on how 
the provisional values 2,, [A are of the maximum likelih 
experiment, the samples were ordered s 
differing from the previous one in only 
the computers to guess the values of 


good an approximation 


>In others two with the Finney- 
Stevens tables, and only one with the W.P.A. tables. At most we used five cycles, while some 
uch depends on the sample, and on how 
lly, no good judgement can be made as to 
tain a satisfactory maximum likelihood 
nnot depend on just how many cycles have 
ctive methodical scheme by which one can 


th punch card together with j ! 
, 3 : : F > h its sample for 
identification. By contract with a service bureau of the International Business eR 


Corporation, the square of the estimate of each parameter and their product were c 
with the use of IBM calculating machine number 164 and checked with the sam 
and the several results were punched into the same card. Other multi vi 
the calculations were performed in the same way. 

„ The 1331 cards thus prepared, each punched with its sample, the estimate 2 the estimate 
P, the square of each, and their product, constituted the basic instrument for d : "s of the 
computations. Corresponding to each dosage arrangement for which statisti aes t be 
computed, it was necessary now to calculate the probability ofeach sampl Towitu: in 
the dosage arrangement with P, — 0-5, the prob Mala coca 
probability P, at the highest dosage is 0-7. For 


via. a the lowest dosage is 0-3 and the 
He each of the eleven i Ji 
dosage the probability was directly calculated, ie RUPEM cai 


checked, and hand-punched into auxiliary 


alculated 
machine, 
plications required for 


* In the corresponding experiment with the lo 
computed, correct to one more 


than was necessary for the 
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cards. Each combination of three of these, one corresponding to each of the three dosages, 
was multiplied together in two steps with the use of the automatic multiplying punch 
machine to give the probability of each sample, and each multiplication was checked. The 
resulting 1331 probabilities were transferred to the corresponding main cards with the use 
of a reproducing punch machine preparatory to multiplying them by the estimates, their 
Squares and products. These multiplications were accomplished with the use of the auto- 
matic multiplying punch, and the results were punched into the same main card and checked. 
The sums of estimates, sums of squares and sums of products necessary for calculation of the 
required statistics were now obtained by totalling these quantities with the use ofan available 
IBM punch-eard tabulator. 

For dosage arrangement with probability of the central dosage P. = 0-5, the dosage values 
v,at the successive dosages were taken as — 1, 0, + 1, and the values of 2 and f, the estimates 
of 2 and P, respectively, which were punched into the master cards, were entered as for this 
dosage arrangement. With other equally spaced dosage arrangements, for any specific 
sample, Î, the estimate of f. will be the same, but the estimate of æ will be different and is 
Biven by CET EE (16) 
Where Sy; is the estimate for that sample with P, = 0:5, and C = f! x normit of Ra 
Accordingly the estimates did not have to be computed anew for each new experimental 
Set-up, since all the experiments dealt with were for three dosages, equally spaced. It was 
necessary, h owever, to calculate anew the probability of each sample, from the probabilities 
corresponding to the new dosage positions.” These were punched into the main cards, the 
Values Qos, O25, f), Be, s B having already been punched into them. The five multiplications 
of the sample probability by these quantities were accomplished, and the results were 
Punched into the cards and checked as for the calculations with P, = 0-5. By totalling these, 
the required sums of estimates, the sums of squares and the sums of products are obtained, 
and from them the required statistics were computed for the new dosage arrangement, taking 


to account relation (16). 


2. Results 


The primary purpose of these experiments was to compare the error of the minimum 
normit X? estimate with the error of the maximum likelihood estimate. An indefinite 


number of criteria for such a comparison are conceivable; I used the classic measure of error, 
hie mean Square error, that is, the expected value of the square of the difference of the 
- , > 


estimate fro ter: 
m the true value of the parame . 
his cannot be the place for an adequate discussion of such a fundamental idea as the 


Concept of error. However, I should like to note that at the beginning of these investigations, 


tj some interest to note that this x not the eue for the equivalen situation with he Iogistio 
ction, T i i ufficient statistics which for : Lar Miei 
SP ae ate Enee S of samples, within sets of samples ae | fame valuedfor 
these Statistics ES dur the same, independent of the change of pe i eer pou of 
a set, with a change of parameter, therefore can be obtained from E e p : Ml it i e original 
Value of the A by multiplication with a determinable constants -- E e t iod arene. 
ent is equivalent to A change of the parameter c, and for the i ipie s wi T E * erm i 
being known, since there are only 31 sets of samples defined by 2iny toe required probabilities with 
Change of cea n Eth ont can be determined by 31 simple multiplications instead of 1331 more 
involve d mue "um Pa a pent facilitation illustrates the } great general statistical advantage of 
Working With 8 Acito that possesses simple sufficient statistics for estimating its parameters. 


424 Tables for estimating normal distribution function by normit analysis 


I did consider and make calculations on the basis of other criteria, as well as with the mean 
square error. In particular I explored the idea of comparing the discrepancy of the estimates 
from their parameters directly in terms of the distribution of the estimates. This is related to 
what Savage (1954) calls the ‘principle of inadmissibility’, and to what Pitman (1937) has 
investigated as the ‘closeness’ of an estimate. According to this principle, if T' is an gettin 
of 0 and eis the error (T — 0) of estimate, then if the probability of an error equal to or greater 


than |e | is smaller for an estimator T, than for T, (excluding the values of e for w 


hich the pro- 
babilities are equal) 


forall values of e and for all values of 0, then Tisa better estimator than 
T, in an absolute sense. Intuitively this principle seems incapable of deni 
disposed to question it here. However, I do not believe that it; can be very u : 
a criterion for comparison of estimates in situations Which are met in practice. For it will 
only rarely occur that two seriously recommended estimates are comparable in this respect 
at all; it will not even be typical for one to be better tha 


n the other in this stringent sense at 
any given 9, let alone one being better than the other in this sense at all 0. 


al, and I am not 
seful in providing 


an T, in the 
= 0:5. In this case the 
of the parameter is zero, while for 7, it is 0-5, so 
that, for instance, the probability of an error equal to or greater than | 0-1 | is 1 for T, and 
0:5 for T). In comparing the minimum normit x? estimate with the maximum likelihood 
estimate in terms of the distribution of the estimates, I encountered just such ‘paradoxes’. 
Although, philosophically the mean Square error cannot be considered necessarily the only 
relevant measure of error, I hardly believe that a better criterion for general application will 
be found. This is to say, if for some practical situati 

asmaller mean square error than 7), for all reasonab 
T, will be found better than T, by some other reaso 


v; ; as the number n; at each x, is inereased in 
approaches a normal distribution the variance of which ig equal to 
other similarly distributed estimate. It is to be noted th: 
attained limiting distribution and that mathematica 
imply necessarily any attribute of the estimate with fini 
to assume, for practical cases like the present one, that, 
will have about the Same variance. In one e 
corresponding respectively to true P's 0-3, 0- 
stratified random sample, this was co 


or less than that of any 
at the properties refer to the never- 
lly asymptotic efficiency does not 
te samples, However, itis reasonable 


hat for large n, at each Ti, both estimates 
xperiment with three e 
5, 0-7, 50 at e 


rroborated. With B = 0-524401 consid. 
> ered known an 
& = Oto be estimated, the theoretical value of the asymptotic variance correct to six decimal 
places was 0-011 186, the determined value .of the vari 
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maximum likelihood estimate was only slightly higher, namely 0-011191 (Berkson, 1955 a). 
For both estimators, to four significant figures and five decimal places the variance had 
attained its asymptotic value. 

With small samples, comparison of the error of the estimates is complicated by the fact 
that for some samples the maximum likelihood estimate is infinite. In each such case the 
sample was omitted in computing the statistics of the minimum normit y? estimate as wellas 
of the maximum likelihood estimate. With probability at central dosage P, = 0-5, either in 
the case of estimating only « or in that of estimating both parameters, the frequency of such 
Samples in the sampling population is so very small that their omission can reasonably be 
considered not to affect the virtual comparison of the estimates. But as the dosage arrange- 
ment is changed from the symmetrical one with P, = 0-5, to asymmetrical arrangements with 
kr appreciably different from 0-5, the probability of such samples increases, and the calcula- 
tion of the statistics omitting them becomes questionable. I have made comparisons only 
for situations in which the samples yielding an infinite estimate by maximum likelihood 
Constitute less than 5% of the sampling distribution. When both parameters are to be 
estimated, with P, = 0-9, 25-1 % of the samples yield infinite estimates by maximum likeli- 
hood. The upper limit of the experimental arrangements was therefore taken as the one 
Corresponding to B, = 0:8, for which the probability of samples with an infinite maximum 
likelihood estimate is 4-494. In order to co-ordinate the computation, the E^ pores 
estimating only æ were limited to this region also. Only experiments awe £ c> 0:5 are 
described, since the relation between the estimators is the same for any P; < 0-5 as for the 
One with P. = 1... . 

The E oc ae for a bio-assay experiment is with dosages x symmetrically 
disposed around the valuefor which P = 0-5, foritis with this arrangement that the variances 
of the estimates are smallest. Of course, without knowing the values of the parameters 
exactly an experiment cannot be accomplished with just this Bo Me 7 pede but by 
Preliminary experimentation the position of x for which P = oe pe s onte M t 
mately, ang a well-designed experiment should not be done wit à central x kee erent 
from this value. In a sense, therefore, the comparison ofthe estimates D other when 

e = 0-5 is the most important one, since it compares the esa m t " i iain 
Tom which good experiments will not be far removed. With t "u iiie. a. m 
the estimate of æ is unbiased, both for the minimum normit y ge Or EP Creta ikeli- 
ood estimator, whether it is estimated alone or yer cipe with f. : or £. "ele T 
In this arrangement, both estimators are biased, and for pil € "d on th ah à tele ie d 
ofeach parameter is biased for both estimators. Unbiased — a i. e dn se m 

to be exceptional, limited to a, and even in that case only with the ie d s 
The results are shown in detail in Table 4 and Table 5 pa > ini ch à e Bee f p 
Own (Table 4), the bias is positive for the maximum likelihoo estimate, and negative for 
fet: ree: : ]ue of the latter being the larger of the two. 

© minimum normit x? estimate, the absolute va st, v*-aiitnte, and mifiutendi 
© Variance is in all cases smaller for the minimum norm? ae = UPS rad 
Smaller to outbalance the larger bias, the net result being n enm vis ans tó beiet matai 
“dimum normit y? estimate in all the experiments. In the c a hen 

“imultaneously (Table B), the biasis negative fora and positive for f, p 


ily a desideratum. It is not obvious that, say, 


* : i ar! p . 
Ido not mean to imply that unbiasedness is necessi e mean square error of a biased estimator 1’, 


ig iased estimator T, whose variance is larger than th 
Preferable to it. I should prefer T^. 


an 
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normit x? estimate, which at P. = 0-8 is positively biased for a and negatively for f. The 
variance is smaller for the minimum normit x? estimate, of either parameter, alone or both 
simultaneously, and with all dosage arrangements, and so also is the mean square error. ie 
may be noted that whereas in the case of « alone to be estimated, the minimum normit Xx” 
estimate achieves a smaller mean square error in spite of its bias having a larger absolute 
value than the maximum likelihood estimate, in the case of both « and f] to be estimated, for 


Table 4. Estimate of a with B known 


Based on the total sampling population. The sam 
are omitted in calculating all statistics. The greates' 
the experiment with P, = 0-8. 


ples with infinite estimate by maximum likelihood 
t frequency of such samples is 0-04 94, occurring in 


Based on the total sampling population. The samples with in 
are omitted in caleulating all statistics, the frequency of such s 


True P at dose Bias Variance Mean square Mean square 
error error + 1/7 
i yr T É 
Max. Min. Max. Min. Max. Min. Max. Min. 
Low | Mid.| High | likeli- | normit likeli- | normit | likeli- | normit likeli- | nor- 
hood x hood | m hood x hood | mit Y? 

0:3 0-5 | 0-7 0 0 0-0589 | 0-0537 0-0589 | 0-05 5 
"0537 | 0-0559 | 1-054 | 0-961 
0-393 | 0-6 | 0-782 | 0-0068 | —0-0073 | 0-0606 0:0546 | 0-0606 0:0547 Pe T 0-958 
0.5 | 0-7 | 0-853 | 00152 | —0-0177 | 0-0663 | 0.0572 | 0-0665 0:0575 | 0-0612 | 1-087 | 0-940 
0-624 | 0-8 | 0-914 | 0-0287 | — 0-0401 | 0-0802 0:0598 | 0-0810 0-0614 | 0-0706 1-147 0-870 


Table 5. Estimate of a and f simultaneously 


finite estimate by 


[ maximum likelihood 
amples being shown in the last column. 


True P at dose Bias Variance Mean Edusre ero 
% 
3 gampi 
Max. Min. Max. Min ) , insoluble 
Low |Mid. | High | likeli- | normit | likeli- | normis tke NN max. 
hood | x hood » Rood ROPERS likelihood 
x 
Estimate of g 
0-3 05 | o7 0 0 0-0675 | 
0-05 
0393 | 0-6 | 0-782 |—0-0027 |—0-0079 | 0-0880 buen: 00675 | 00588 | 0-09 
Qoi Q7 | 0853 |—oo135 | —o-0041 | 01587 Mess | gaan | C072 | Bn 
9H | os | 091 |—oo278 | 0-0517 | 03698 | gages Hoe o | oe 
f aoe i 
Estimate of B 
0-3 05 | 0-7 0-0467 | 00288 | 0 
4 k i -1109 0-09. " 
0299 | 06 | 0782 | o0505 | oo34 | o-1172 "ond 01131 | oo955 | 009 
0624 | oi | 9893 | 00588 | o-0018 | 01339 €0898 | osya | S0941 | O1 
5 | 0914 | 00526 |—o0564 | 031494 | o.0821 0-1522 d Tar 
-0856 : 
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a number of the dosage arrangements the absolute value of the bias as well as the variance 
is smaller for the minimum normit x? estimate. 


3. Lower bound of mean square error 
In the case of one parameter to be estimated, with certain regularity conditions obtaining 
which are well fulfilled in the present problem, the lower bound for the mean square error of 
an estimate T of a parameter 0 is given by* 


1-4-0b[00)* 
Er 0p»! EART ga (17) 
and the lower bound of the variance is the first term of the right-hand side of (17), where 
b = E(T — 0) is the bias of the estimate 7’, and J is Fisher’s amount of extractable informa- 


tion, the value of which is 2 
E élng 
Pe abl (18) 
maere ¢ is the probability of the sample. The parameter to be estimated being æ, the value of 
is 
n;Zi 
I(a) = 22g," (19) 


For the evaluation of the lower bound (17), Z (#) can be calculated directly, but 0b/da is 
also required. This was evaluated by plotting the bias b against a and estimating the slope 
9f the bias function graphically, at required values of «.+ The bias function for both 


e H * "n n 
Stimators is shown in Fig. 2. 
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Maximum likelihood 


Minimum normit gp 


07 08 
P of central dose ud 

pt 
——9À29 RA RN RN 
a = normit of Pc 


DS 06 


own. The evaluation of 0b/2x required for 


Fig 2 : R i k 
T3 EE O NT s f æ with p kn 
Bias of & in relation to a; estimate ot & f phically from an enlarged replica of this 


Calculation of the lower bound (17), was obtained gra 
figure, 


* This inequality is frequently referred to as the *Qramér-Rao lower bound’. It was, however, 
loped by Fréchet, and in a very clear way, earlier than by these authors (Savage, 1954). 

hange of the aes go arrangement from one with P, = 0-5 to some other value of P, is equivalent to 
ing the original dosage disposition and changing from a = 0 to & = uer 


deve 


Tetain 
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The lower bounds of the maximum likelihood estimator and the minimum normit x 
imator are given in Table 6, and shown in Fig. 3 together with the corresponding nean 
Mee rs. It is seen that neither estimate attains the lower bound for its mean squar e 
ONDES the lower bound of the minimum normit 5? estimator is lower than that of 
conan likelihood estimator. The attainment of a smaller lower bound by the mi NE 
mum normit X? estimator is directly related to the character of its bias function, the first 


derivative of which is everywhere negative, while that of the maximum likelihood estimator 
is everywhere positive (Fig. 2). 


of 


F 
Mean square error-., 
x 
[1 
5 AR "Lower bound 
? 
Ej 
Li Maximum likelihood 
E 
= Mean square error 
006 
Lower bound 
Minimum normit x2 
005 


05 06 07 08 
P of central dose 
0 01 02 03 04 05 06 
« = normit of Pe 
Fig. 3. Mean square error and lower bound of the m 
estimate of with / known. 


97 ^ 08 9s 


ean Square error; 


Table 6. Lower bound of the mean square error; estimate of æ with B known 


True P at dose Maximum likelihood Minimum normit y? 
Low Mid. | High L.B. | M.S.E, M.S.E. 
— | f — i— -: 
0-3 05 | 04 0-0585 | 0:0589 H 
0:393 06 | 0782 0:0603 | 0.0806 odd 
0-5 07 | 0-853 0-0656 0-0665 0:0575 
0-624 08 . 0914 0-0784 0-0810 0-0614 
| 
We ma: 
th y also note the relation between the variance and ] |I, which is sometimes treated 
as the variance equivalent of the amount of extractable informa 
bound of the 


n Vis 
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4. Distribution of x? 


Related to estimation of the parameters in the situation of bio-assay is the testing of 
goodness of fit by the chi-square test. The standard practice is to calculate the classic x 
of Karl Pearson 


To 


NOE 
2 = x t t 2 
x - (20) 


i 


Where, in the present situation, o; is the observed number of responses at v;, and e; is the 
‘expected’ number, computed as e; = P,n,, where P, is the probability at x; calculated from 
(1) using the estimated values of the parameters to replace the true values. The test of 
significance is obtained by reference to a table of the distribution of chi-square* for degrees 
of freedom equal to the number of dosages x; less the number of parameters which have been 
estimated. 

It is well known that the x? of (20) is only asymptotically distributed as chi-square, and 
then only if the estimates of the parameters are asymptotically efficient estimates. For finite 
samples it is so distributed only approximately. Now the classic y? of Pearson is not the only 
function of the observations which is asymptotically distributed as chi-square. The normit 

2 A A 
B5 G; x° (normit) = En;w;(X;— X;), 
which is the quantity minimized to obtain the minimum normit x? estimate, is also asymp- 
totically distributed as chi-square. If the parameters have been estimated by minimum 
normit x?, the normit 1? is somewhat easier to compute than the Pearson x?, since it can be 
calculated directly using the estimated parameters by the familiar formula 


v= En, (X;—X)?—PEn,w(X,—X) (v; — &). (21) 


When the parameters have been estimated in this way, therefore, it seems natural to 
Calculate the normit X? for the chi-square test, and the question arises as to whether it is 
distributed as closely to the asymptotic chi-square distribution as is the Pearson X?. Then 
again, since the minimum normit y? estimate and the maximum likelihood estimate are both 
asymptotically efficient, the question arises as to whether perhaps one rather than the other, 
Used to calculate the expected numbers e; in the Pearson y? of (20), gives a better approxima- 
tion of the theoretical chi-square distribution. 

Tn order to shedsome light on these questions, the distributions of three sample y? functions 
Were calculated for the experiment with P. = 0-5 in the case with œ alone to be estimated : the 
normit y? for the minimum normit y? estimates, the Pearson x? for the minimum normit xt 
estimates, and the Pearson x? for the maximum likelihood estimates. 

For each sample, the x? was computed, using the estimates with four decimal places and 
Writing the calculated y? to three decimal places. The result was punched into a card 
together with the sample designation and its probability. The distribution of the x? was then 
easily obtained by eumulating the probabilities progressively on the tabulator, with the 
Samples ordered on the value of the x?. The cumulative distributions for the sample x? 
functions are shown in Fig. 4 together with the asymptotic chi-square distribution, and 


Some selected values are listed numerically in Table 7. 


* By the distribution of chi-square we mean the exact distribution of the sum of squares of a 


Number of independent normal deviates having unit standard deviations. 
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Table 7. Distribution of sample x? functions for experiment with P, = 0-5, æ alone to be 
estimated; comparison of the percentage frequencies with the theoretical frequencies 


The entries give in % for the respective x?s, Pr (X> xo), where in the theoretical asymptotic 
chi-square distribution for 2 degrees of freedom, Pr (x° > xo) = L 96. 


[ oe 
Max. ukatinoad Min. normit 3? estimate 
estimate 
Level L xi 
(96) i — 
Pearson x? Pearson y? Normit y? 
25 2-713 26-15 26-66 24-02 
20 3.219 22-69 22-22 20-96 
15 3-794 15-65 15-49 11-76 
10 4-605 9-65 9-64 8:15 
5 5-992 5:14 5:12 2-70 
1 | 9-210 0-79 0-81 0-39 
0:5 | 10-597 0-46 0-46 0-14 
1 


Inspection of the normit X? distribution in Fig. 4 and Table 7 reveals rather wide dis- 
crepancy of the distribution frequencies from those of the asymptotic chi-square distribu- 
tion. The frequencies for specific points of X? are sometimes higher, sometimes lower than the 
asymptotic frequency, until about the 15 % level. From that point forward the normit y? fre- 
quency is consistently too low; at the 5% level the frequency is 2-79/, and at the 1 % level the 
frequency is 0-4 94. If, therefore, the normit x? were used for a test at either of these levels, 
the tested hypothesis would be ‘rejected’ considerably less frequently th 
theory. The Pearson y°, for the same minimum normit y? 
better agreement with the asymptotic chi-square distribut; 


level. For the 5 % level the frequency is 5-1 % and for the 1 % level it is 0-8 %- The normit y? 
should therefore not be used in preference to the Pearson X5 for a y? test of significance, even 
though it is somewhat easier to calculate. This conclusion is in line with that of David (1950). 
who condemned the use of the Neyman reduced X for a chi-square test of significance, even 
when there was good reason for preferring an estimate obtained by minimizing that y?. 
Comparison of the distribution of the Pearson x’ in Fig. 4 and Table 7, for the minimum 
normit x? estimate and the maximum likelihood estimate, shows them to be closely similar 
in their approximation of the asymptotic chi-square distribution, At the 5% and the 1% 
levels the approximation is somewhat closer for the minimum normit x? estimate than for 
the maximum likelihood estimate. Thus, the advantage of the minimum normit x? estimate 
over the maximum likelihood estimate found in Tespect of the variance and mean square 
error 15 supported, or at least not offset, by the approximation of its x? distribution to the 
asymptotic distribution. ‘ 


an required by 
estimates, is seen to be in much 


ion in the region below the 15% 


5. Conclusion 


It has been mathematically demonstrated 
estimate falls in the class R.B.A.N. 


tial 
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TO 
o9 E 
0-8 Normit X? with minimum normit X? estimate 


0 1 2 3 4 5 

xi 
Fig. 4, Distribution of sample X? functions; estimate of a with f known; the theoretical 
asymptotic distribution is the chi-square distribution for 2 degrees of freedom. 


6 7 8 9 


Statistica] literature, both estimates are normally distributed, with mean equal to the 
P arameter estimated and variance given by 


o?(ĉ) = 1/2(n;Z3/(F,Q;)), (22) 

o) = 1/Z(n, Z} P:Q:)) (v 2), (23) 

= g = X(n;Zi(P,Q;) vE(n; Zil(F;Q;). (24) 
28 
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For small samples, we have found in the experiments presented that the mean —€— 

error of the minimum normit x? estimate is smaller than that of the maximum likelihoo : 
estimate. Since this conclusion is based on numerical findings in specific situations and = 
on a mathematical demonstration, the question may be asked whether it can be sanare 
proved, and how general it may be regarded as being. Of course no specific numerical results 
can be considered to be a proof. Even so far as concerns the particular dosage arrangements 
for which the calculations have been made, there is the possibility that some arithmetical 
errors have been committed. All that can be said on this point is that it is unlikely in the 
extreme that arithmetical errors have been made that are of so serious a nature that their 
correction would reverse this conclusion. Care has been taken, so far as practicable, to check 
each arithmetical operation involved in the calculation, and in addition over-all indirect 
checks have been applied whenever the possibility presented itself. The ratios of and dif- 
ferences between the mean square errors of the two estimates, as they have emerged from 
these caleulations, change in a regular way with change in the assumed values of the 
parameters. The conclusions are essentially the same as those previously reached indepen- 
dently with the use of carefully planned samples. The results are in agreement, in general 
and in detail, with those obtained in investigations of the logistic function, which is very 
similar in curvature characteristics to the integrated normal function. So far, therefore, as 
the specific experimental conditions in which these investigations have been made are 
concerned, there seems no reasonable doubt but that the mean Square error of the maximum 
likelihood estimate is larger than that of the minimum normit X? estimate. 


Table 8. Estimate of « and ff simultaneously, with dosages spread 


P, — 0:5, the lower dosage is at P,— 0-1, the upper is at P, = 0-9 instead of at 0-3 and 0-7 as in the 
standard experiments. The samples with infinite estimate by 


the sampling population and are omitte 


maximum likelihood constitute 12-2 % of 
d in calculating all statistics. The results are based on the total 
sampling population. 
| | 
| á lj 
Statistic | | 
IF Max. Min. Max. | Min. 
| likelihood normit y? likelihood normit X? 
| | : | | E : a 
Bias 0 0 0-0507 
3 — 0-0983 
Variance | 0-1052 0-0562 0-1246 Peers 
Mean square error | 0-1052 | 0-0562 0-1272 0-0663 
| 
ll 


A further question that may be asked is whether the same conclusion applies to expeti- 
ments with a different arrangement of dosages. It was early suggested in connexion with 
the logistic function that the relation between the mean Square errors of the two estimates 
might be reversed, if the dosages were arranged with a wider ‘spread’. For this reaso” 
a calculation was made, using the total population of samples, for an di eriment with 
P, = 0:5, but with the lower dosage at P; = 0-1 and the upper dosage at P, SM instead of 
at 0-3 and 0:7, respectively. The results (Table 8) not only confirmed the ias er nan square 
error of the maximum likelihood estimate, but showed it to be in larger E to that of the 
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minimum normit y? than with the more narrowly disposed dosages, and moreover pointed 
out more emphatically the difficulty encountered in the maximum likelihood estimator’s 
yielding infinite estimates. 

Similarly, it was suggested that a different sort of estimate, such as a bio-assay estimate 
of relative potency, might show the smaller error to be with the maximum likelihood 
estimator. A calculation was therefore made for an experiment simulating a four-point 
parallel assay. In such an experiment a mixture of unknown strength is to be assayed as to 
its relative potency with respect to another mixture considered as standard. For the stan- 
dard mixture, two concentrations x, and v, measured in logarithmic units are used, and n; 
animals exposed with each; for the unknown, thesame two concentrations 1, and x, are used, 
and n, exposed at each. Thus there are four relative frequencies observed, two corresponding 
to the standard mixture, two to the unknown. Two normit lines («+ £x;) are fitted, one to 
the points of the standard mixture, one to the points of the unknown, these lines constrained 
to be parallel, so that B is the same for both. The distance between the fitted parallel lines, in 
units of x, is the difference of the logarithms of the dosage of the unknown and standard 
respectively, which produces the same response. This isi taken as the measure of the logarithm 
of the relative potency, and given by M = (&, —&,)/P, where &, is the estimate of a corre- 
Sponding to the normit line of the unknown, and &, is the similar estimate corresponding to 
the line of the standard mixture (see Finney, 1947). The estimate, based on 100 stratified 
random samples, showed the maximum likelihood estimator to have the larger mean square 
error (Table 9). 


Table 9. Estimate of log relative potency M, with a four-point parallel assay 


Simulated experiment with ‘standard’ and ‘unknown’ taken as of equal potency, M = 0. For both, 
lower dosage a, at P, = 0-3, upper dosage x, at P;= 0-7. n;= 10 for each of the four observations. 
Based on 100 stratified random samples of the four-point assay. 


Estimator Bias* Variance M.S.E. 
Maximum likelihood — 0-014 0-185 0-185 
Minimum normit y? — 0-013 0:175 0:176 


* The expected value is actually zero; the empirical figures are to be attributed to sampling 
fluctuation. 


Lastly, it was suggested that the relative results would be reversed if the number of doses 
Were &reatly increased. In this connexion specific results can be quoted only for experiments 
With the logistic function. Since in other experiments performed with both functions the 
Tesults are similar, these findings with the logistic function are relevant. Experiments 
Comparing the maximum likelihood estimator and the minimum logit x estimator, with 

Osages up to 11 (Berkson, 19555), and experiments comparing the maximum likelihood 
estimator with the minimum Pearson x estimator with dosages up to 100 (Taylor, 19535), 
Showed the maximum likelihood estimator to have the larger mean square error. 

In Spite of all the corroboratory findings, in a variety of experimental conditions, of the 
arger mean square error of the maximum likelihood estimator, the possibility is not excluded 

at some conditions may be definable in which the minimum normit x? estimator has the 
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larger error. It seems unlikely, however, that if such conditions are discovered, they will 
correspond to any met in practice in a well-designed bio-assay or in similar assay. -— 
Even with all this, the important finding of the present investigation is not tha ; 
minimum normit x? estimator has a smaller error than the maximum likelihood estimator, 
even if this is accepted in full generality, but rather that the maximum likelihood estim "e 
does not have the smaller error in all circumstances. This last really did not require proo d 
since no serious evidence to the contrary had ever been presented. What does require wider 
appreciation among statistical writers is that it is unwarranted to claim that some pum 
known principle of estimation necessarily always yields the best estimate. If in the presen 
circumstances the minimum normit x? estimate is easier to compute, and at the same time 
is a better estimate than that provided by the maximum likelihood estimator, there may be 
other methods of estimate—and I believe there are— which are even easier to compute and, 


in the practical circumstances in which they are applied, may be better than the minimum 
normit X? estimator. 


To the many young ladies and gentlemen, who, over the last 8 years, have helped in the 
calculations required for the investigations referred to in this paper, I tender my thanks. 
Some, but not all, of the individuals are: Gretchen Eusterman 


ginning, and whose talent for orderly filing, and 


matter how falsely identified by me, made 
progress possible, and prevented Bedlam from supplanting the Mayo Clinic. 
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SHORTER INTERVALS FOR THE PARAMETER OF THE BINOMIAL 
AND POISSON DISTRIBUTIONS 


Bv W. L. STEVENS* 


SUMMARY. Different methods are examined for determining, for the binomial and Poisson 
parameters, narrower intervals than those furnished by the classical method. A way is found for 
calculating, from existing tables, the limits which are produced by the device of adding, to the 
observation, a randomly chosen number between 0 and h 


1. INTRODUCTION 


upper limit. This is certainly 
quite common in practice; for 


There exists, however, anothe: 
shorter intervals (Anscombe, 194 
1950), which possess the two de 


r and perhaps more ele 
8; Stevens, 1950; Toche 
sirable properties: (1) 


ne. Then y = f+ vis a real number 
can find 7, and T; 


of black balls in ¢ 
limit, between th 
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2. CALCULATION OF LIMITS 
The functional relation between To, the lower 10 % limit, and y, for n = 10, is illustrated in 
the graph, based on a table in Stevens (1950). It is seen to be composed of ares concave 
upwards linking the points corresponding to integral values of y. The function is continuous 
but the first derivative discontinuous at these points. 


03 


To 02 


01 


v 1 2 3 4 5 6 
y 


Fig. 1. Portion of the graph showing relation between lower 10 % limit (n = 10) and y =f +w. 


We shall continue the discussion on terms of the lower limit: the necessary change of 
Words to apply it to the upper limit will be obvious. Now a simple first suggestion for finding 
a limit, without using a table, would be to find, by the usual method, limits zy and 95s 
Corresponding respectively to the observations, f in n and f+ 1 in n, and to make a random 
linear interpolation between these two values. This is equivalent to replacing the arcs in the 
&raph by straight lines joining their extremities. But it is immediately seen that the conse- 
quence of doing this is that the probability that 7 lies below the lower limit will be not less 
than that stated; though admittedly it will not be much more. Now, if we have to make an 
inequality statement, this must be in the opposite form, ie. that the probability is not 
greater than that stated. Hence, this method cannot be considered satisfactory. 

We therefore turn to consider the possibility of interpolating between two consentite 
integral values of y (y = f and y = f+ 1, i.e. v = 0 and x = 1) by means of a Taylor s series. 
The value of di/da (we will drop the suffix, 0) is obtained from Stevens, equation (2-22); we 
Should note, however, a mistake in equation (2:13)—the hg and h, should be interchanged. 


e find 


a 
dx (n—fy, fü 9) 
l-r m 


The second derivative is found in the usual way from 


da 2 (dm| , 2 (5) (z) 
ds 7 ac) tin dx} \dx}’ 


438 Shorter intervals 


the calculation being simplified by the fact that it is required only at the values x = 0 and 
x = 1. The results are 


d? /da* | 


x dn/dx | 
| 3| ge 
"t = Maes | 


If x is less than 3, we extrapolate forward from the lower value of 7, using the formula 
7 + (dzr|do:) x + 3(d*mr da) a2 
and if greater, we extrapolate backwards from the higher value, using 
T- (dn|dx) (1. —2) + Md?rdz?) (1 — ze, 


To illustrate the process, we will consider the lower 10 05 
Fisher & Yates or other table, we find the lowe: 
*4 in 10*. We find 


limit for the result ‘3in 10’, From 
r 10% limits for this result and for the result 


z T dn/dx | (drda?) | 
0 0-116 0-0387 0-0199 
1 0-188 0-116 0-0411 


The results of interpolation can now be compared with the tabulated values as follows: 


x (a) Table (b) 
0-0 0-116 0-116 = 
01 0-120 0-120 = 
0-2 0-125 0-124 -— 
0-3 0-129 0-129 0-127 
0-4 0-135 0-135 0-133 
0-5 0-140 0-141 0-140 
0-6 0-146 0-148 0-148 
07 0-153 0-157 0-157 
0-8 = 0-166 0-106 
0-9 — 0-176 0-177 
L0 — 0-188 0-188 

I 


(a) Extrapolated forward from z= 0. (b) Extrapolated backward from vl 
The errors of interpolation would seem to be too s 


mall to be revealed by this comparison. 


e 
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To find the upper 10% limit, we find from the table the limits for the observed result 


‘3 in 10’ and for the result “2 in 10°. These correspond respectively to x = 1 and x = 0. 
Hence 


& | 7 | dada A (d7/da*) 


eS S = 
| 
0 0-450 0-150 — 0:0432 
1 0-552 0-064 — 0:0255 


Fora = 0-5 both forward and backward extrapolation yield 7 = 0-514, while the table gives 
7 = 0-512. The difference probably represents the error in the table. 

It is advisable to use the same value of x for both limits and indeed for all intervals, if 
More than one is determined. Thus suppose the table of random numbers yields 268..., 
Which on rounding gives x = 0-27. The limits will be 


To = 0-116 + (0-039) (0-27) + (0-020) (0-27)? = 0-128, 
m, = 0-450 + (0-150) (0-27) + (— 0-043) (0-27)? = 0-487 


and we can state that 
Pr(z « 0:128) = 10% 
Pr(0-128 < z < 0-487) = 80% 
Pr(0-487 <7) = 10%. 
Special treatment is needed for f — 0 and f — n. Tt is recalled that if f = 0 and x< 1 — P, 
en 7 = 0, Hence, if such a value of z is drawn (e.g. x < 0-9, for a lower 10% limit) z, is 
taken ag zero. Otherwise, we extrapolate backwards from the limit corresponding to f = 1, 


inally, we may note that essentially the same method may be used for setting limits to us 
© parameter of the Poisson distribution. The derivatives at z = 0 and 1 are: 


v dy [da | dulda? | 
| 

TE | G'E) 

eh | MES 


3. CoNCLUSION 
at is Open to question whether anything material is gained by devices for finding intervals 
horter than those provided by the classical method. If, however, the statistician is deter- 
ed to have them, it would seem that, for both practical and theoretical reasons, the 
“vice of adding to the frequency a random number between zero and one is superior to any 


Sei P. is paper has shown that these intervals can be found by a simple interpolation in 
Ing tables. 
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UPPER PERCENTAGE POINTS OF THE GENERALIZED BETA 
DISTRIBUTION. II 


Bv F. G. FOSTER 
Research Techniques Unit, London School of Economics 


l. INTRODUCTION 


These tables extend the tabulation of the 80, 85, 90, 95 and 99% points of Z,(k; p,q) to 
the case k = 3. They are a continuation of the tables for k = 2 given in Foster & Rees (1957), 
which will be referred to as ‘I’. Reference is made to I for all definitions. 


2. INTERPOLATION 


Interpolation will be required for integer values of v, = 2q +2. For v, greater than about 80, 
linear interpolation will give accuracy to one unit in the fourth decimal place; for smaller 
values of vı, 4-point interpolation to halves should be used. For interpolation between 
% = 194 (q = 96) and v, = co, 3-point harmonic interpolation, based on q = 48, q= 96 
and q = co, may be used, as indicated in I. 


3. USES OF THE TABLES 


Typical applications to tests of significance for the case k = 2 were given in I. A further 
example is given below to illustrate the present tables. 
Example. Table 1 gives the analysis of dispersion for the three characters, head length 
(2), height (a) and weight (x3), measured on 140 schoolboys of almost the same age, 
elonging to six different schools in an Indian city. This example is taken from Rao (1952, 
P. 263) who examines the data using the alternative A criterion of Pearson & Wilks. 


Table 1 
p 
Sums of products matrix 

D.F. | T 
| al x3 ws | ViTa ooh | Lats 
Between s 5 | 2. 151-3 | seas | 214-2 521-3 401-2 

chools 5 Qu 752-0 

ithin schools 134 We 12,809:3  1,499-6 | 21,009-6 | 1,003-7 | 2,0712 | 4,123-6 
Total 189 | Sy | 13,5613 | 1,650-9 | 22,022. | 1217-9 | 3,192-5 | 4,594.8 

i , j | | 


Tt is p equired to test whether there are any significant differences in boys’ physique 
tween schools. On the null hypothesis of no differences, the matrices (Q,,) and (Wa), 
videq by their degrees of freedom, are independent estimates of the same parent dis- 
*tsion matrix, and our test consists in computing the greatest latent root of | Q— 45 | = 0, 
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and entering the table with v, = 134 and v, = 5. We find that Omax. = 0-10055, which is 
only significant at the 15 % level. 


4. METHOD OF COMPUTATION 


The computations were carried out on the DEUCE Computer of the English Electric 
Company. Let 0,4, denote the greatest root of 


| ¥2B—(v,A+v,B)| = 0, 
where A and B are independent estimates, based on v, and v, degrees of freedom, of a parent 
dispersion matrix of a trivariate Normal population. Define 
L(3; p,q) = Pr {Omax. < 2}, 
where p = 3(v,- 2), q = 3|», —3|-- 1). Roy (1945) gives a formula which, in the present 
notation, may be written 
L(3; p,q) 


= K{2B,(p, 4) Bp + 2, 2q)—2B,(p + 1,9) B, (2p 1,29) -a»(1 2j B,(2; 2,0); 
where B,(p,q) is the Incomplete Beta function; 


Kai 2+4 1 — POp2g943) . 
2p+q+1 B(p,g) (2p 1)T (29-1)? 
and B,(2; p,q) = I,(2; p,q)/ K, as defined in I. Asit stands, this formula is too ill-conditioned 


to be useful for computation. If, however, we substitute for B,(2; p,q) and distribute the 
normalizing constant, K', we obtain 


L(3; p,q) = L(2p + 2,29) L(p,q) + p[q(2p + 29-1) Us (2p + 2, 29) L,(p, q) 
—L(2p + 1, 24) L(p + 1, q) b, (2p, 29) 


Lp.) —ba(P, 4) L(2p, 24)}, 
where J,( p,q) is the Beta distribution, and 


T(p+q+1) 
b, ,Q) = x» l-zg —— $5.1 
(2,4) ( Yar (p+ 1) F(g) 
The ranges of p and q were p = } (3)4, g = 1(1)96. For the integral values of p, L(p,d) 
was computed recursively as in I, and b,(p, q) by means of the rel 


ations 
b,(p,0) = 0 (p» 0), 
b. (1, 1) = 22201 — a), 
ba(1,q+1) = ,(,q) (q¢+2)/q (q>0), 
b,(p,q) = (p+) (p-1) 


(2d) (p-b +2b(p—1,q)} (p, 


q>0). 
For the half-integra] values of p, 


L(p,q) was computed recursively by means of 
1,(3,1) = at, 
L{p+1,1) = aI (p, 1) 


(p » 0), 
L(s-n-lmao-Lo 

2 q- 1) qt (5. q) 2$ d)) + L,(3, q) (g » 0), 

L(p.q) = 1 —2)L(p,q— Dtzl(p-lg) (pgs 0), 
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and b,(p,q) was computed by means of 
6,(4, 1) = 3x3(1— 2), 
b(t, q 1) = 5, (5,9) (1—2) (2 + 3)/29, 
5,(p, 0) = 0, 


together with the last formula for b,( p,q) given above. 

As projected in I, the percentage points were obtained directly without prior tabulation 
of the distribution function. The method used (which may be described in more detail 
elsewhere) was to compute a table of 7,(3; p, q) for the whole range of p and q for a sequence 
of values of x, starting at 0 and finishing at 1 at a fixed interval,-k. The tables for the current 
value of 2 and the three preceding values, x—h, «—2h, x — 3h, were stored on the drum. 
On completion of each current table, a 4-point inverse interpolation was carried out for 
each of the five required percentage points on the four values in position (p, q) on the four 
tables for all positions (p,g) such that 7, 5,(3; p,q) was less than, and 7, ,(3; p,q) was 
Sreater than the required percentage value. The resulting value was punched out on a card 
together with the corresponding p and q and an indication of the percentage value. The 
values were in fact punched out twice on each card, once rounded to four decimal places and 
once to six decimal places. Differencing was carried out on the latter values as a check on 
accuracy. The cards were subsequently sorted and the rounded values tabulated in the 
Tequired layout. An interval & = 2-8 was found to give accuracy to at least four decimal 
Places for most of the percentage points. Some of the 99% points near the upper limit, 
* — 1, had to be obtained using a smaller interval of 2-10, 

The same method of computation could be used to obtain values for v, » 10, should the 
need for these be felt. At this stage, however, it was thought more useful to devote the 
Computational effort to obtaining a fuller tabulation for Pys 


The author is indebted to the Director of Research of the English Electric Company for 
Permission to use the DEUCE Computer of the London Computing Service for this work; 
and to the staff of the London Computing Service for much helpful advice and assistance, 
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Generalized Beta distribution (cont.) 


| | | | 
E 3 | 4 | 5 6 | 7 8 9 | 10 
| | 
| f= 
| 
| 
| | 
| f 


SS | | 
" — S T -— xx SS | s pcd 
P | | | 
44 0-80 | 0-1916 | 0-2236 | 0-2517 | 0-2769 | 0-2999 | $3210 | asang 58 
85 | -2073 | -2398 | -2678 | .2931 | -3161 | | Ee "aras 
90 | -2280 | -2605 | 2888 | 3 z E E 
2x acd ee] 255 3141 3370 | .3580 ^ 3773 +3953 
: "3465 -3692 | -3898 -4089 -4265 
99 | -3266 | -3586 | -3861 | -410 2 
| 4 | -4321 | -4519 | -4700 | -4867 
46 0-830 | 0-1840 | 0-2150 | 0-2423 | 0.2668 0-2 | 
* 386. -1991 -2304 ‘ag. T oom :2892 | 0-3099 | 0-3290 | 0-3469 
40 | -2192 | -2507 | -2783 | -3030 | goo | 2258 | MAT | -3625 
-95 | .2506 | .2824 | agro | 3030 | -3254 | -3459 | -3649 | -3826 
| 2 | 3099 | -3345 | +3567 é | | 39 
99; -3147 | .3460 | -a730 | 3 3770 | .3958 | -4132 
Is oreo | 969 | .4183 | -4379 | -4558 | -4724 
0:1770 | 0-2070 | 0-2335 | 0-2574 aes i 
‘aed asia x | | 02574 | 0.2703 | 02995 | 0-3183 | 0:3358 
2220 -2487 -2727 2 | 
*90 | -2110 | .2416 | begs | (spag | 2940 | -3148 | 3336 | -3611 
-95 -2415 -2723 Shan | a :3145 73346 | -3533 | -3707 
-99 | 3037 | -3343 | .3608 | lasag | 2491 | -3650 | -3835 | -4006 
S0 080 | 01704 | o1996 | 032254 | ozs | cont | LiT | 74624 | “4888 
:85 | -1846 | .2141 | -2401 | p | Cameo | 02808 | 03082 | 0.3264 
30 | 2004 | -2332 | -2504 | -2829 | 3043 | il | 2291 | -3403 
95 |. -2329 | -2630 | -2893 | .3128 | iag | 223 | 3424 | -3595 
:99 |' 2934 | .3933 | .3493 | .3723 | 5.42 | 73538 | -3719 | -3888 
"2i | :9932 | 4122 | .4297 | -4460 
080 | 01644 | 01927 | 0-2178 | 0-2405 | 0.26] | 
'88 | -1781 | -2068 | .2391 | .2549 3 | 02807 | 0:2987 | 0-3156 
:90 | -1963 | -2253 | .9508 | .2738 | ogo, | 2952 | -3133 | 3302 
95 | .2950 | -2548 | . '2048 | -3141 | -3321 | .3490 
2799 | -3030 | .3239 
-99 | -2838 -3131 ‘3385 | -3612 | -3817 dn :3610 3771 
54 080 | 0-1588 | 01863 | O-2107 | Qo328 | o "004 | 4177 | -4338 
85 | -1721 | -1999 | .2248 | 245 uu | 02721 | 02898 | 0-3064 
90 | -1897 | -2179 | .2428 | .2653 | 285 2663 | 3040 | 3206 
-95 9175 2461 ; 58 +3048 +3225 +3390 
2712 +2937 “314 4 33 
'99 | 2748 | -3034 | .3284 | .3508 di 3932 | 3507 , -3671 
56 080 | -1636 | -1802 | 2040 | 225a | 9 Idle ee 
35 | -1604 | 1995 | .2176 | .2394 | 3&5. | 2041 | 2814 | -2976 
90 | 3835 | 2110 | . z 94 | +2779 | .2953 | -3116 
= 2353 | .2972 | 9754 | 3 
‘op | 2100 | 2385 | .2630 | -2850 | sogi | 2559 | -3133 | -3296 
"2604 | -2944 | .3188 | .3407 | -agos | “2287 | :3410 | -3572 
58 2 01486 | 01746 | 01978 | 0-2189 | 0.93 39787 | -3956 | -4113 
Ee 1611 | 1875 | .2110 | -2322 E 0:2565 | 0.2734 | 0-2894 
as | 7T | 045 | .2282 | 2497 | sepa | 2/09 | -2870 | -3030 
o | 2041 | 28313 | 2552 | 2768 | ‘agog | 2578 | -3047 | -3207 
a 584 :2858 3098 | -3312 Es :3148 |  .3318 -3477 
omn 01440 | 01693 | o-1919 | 0-2125 | 0.23 ahh RUE 
ES *1561 -1819 -2047 -2255 E | 0-2493 0-2659 0-2816 
-1723 -1984 -22 ; 47 -2625 9 |  .9950 | 
-95 16 2496 | 792 29 
:1979 -2245 3 +2619 +2798 4 2 | 
-99 2479 -2691 2965 +312 
"2509 | -2777 | -3012 '2884 | .3064 | .393 3387 
62 080 3223 | -3415 | 359 E | 
pi 0-1396 | 0-1643 | 0.1884 | 03085 | 039 2 -3757 -3910 
‘90 | 1515 | -1765 | .1989 | .2195 2301 | 02425 | 0.2588 | 0.2742 
1672 | .1927 | .9153 | cogs 80 | -2555 | .2718 | -2873 
:95 1921 -2181 8 "2547 -2723 
99 | -2438 2410 | 2617 | 280 | 2 -2887 | -3042 
-2701 .2932 -3139 EU n :3148 -3302 
‘ 3664 | -3816 
This table gives the values of x ; 
of x for which Pr(5,... 


= S25 L2. a) = P, where p = 4,3) g= 4-2 


"T : 
a Able Bives the valu 
9 
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Generalized Beta distribution (cont.) 
aeu | | | 
Vg f 
7 lit $ | a 5 | 6 7 8 9 10 
eee a |. N = d n 
P | 
64 0-80 | 0-1355 | 0-1596 | 0-1811 | 0-2008 | 0-2191 0-2361 | 0-2521 | 0-2672 
-85 -1471 1715 -1933 | -2132 -2316 -2488 :3648 | -2800 
-90 1624 | -1873 -2094 -2295 -2480 -2652 -2814 -2966 
“95 -1867 | -2120 | -2345 -2548 | -2734 -2907 -3069 -3221 
-99 -2371 -2629 -2855 "3058 | -3245 | -3416 -3576 +3726 
66 0-80 | 0-1317 | 0-1552 | 0-1762 | 01955 | 02133 | 0.2300 | 0-2457 | 0-2605 
‘85 -1429 -1668 -1881 -2076 | -2256 -2424 -2582 -2731 
-90 -1578 -1821 2088 | -2234 | -2416 -2585 -2744 -2893 
-95 -1815 -2063 -2283 -2482 | -2665 -2835 -2994 -3144 
-99 -2308 -2560 -2782 -2982 | -3165 -3335 -3492 -3640 
68 0-80 | 0-1280 | 01509 | 01715 | 0-1904 | 0-2078 | 0-2242 0.2396 | 0-2542 
-85 -1390 -1623 1831 | -2022 ‘2198 | -2363 -2518 -2665 
-90 +1535 +1773 1984 | -2177 +2355 +2521 +2677 +2824 
-95 -1767 -2009 -2224 -2419 -2599 -2766 -2923 -3070 
-99 -9948 -2495 -2713 -2910 “3090 | -3256 -3412 -3558 
70 0-80 | 01246 | 01470 | 01671 | ©1855 | 0.2026 | 0-2187 | 0-2338 | 0-2482 
E -1353 -1580 -1784 | -1971 -2144 | -2306 -2458 -2602 
-90 -1495 1727 1934 | -2123 -2297 -2461 -2614 -2758 
-95 -1720 .1958 -2168 -2360 -2536 -2701 -2855 -3000 
99 2191 +2433 +2647 +2840 -3018 | +3182 +3335 | 3479 
72 0-80 | 01213 | 0-1432 | 0-1629 | 0-1809 | 0-1977 | 0-2134 | 0.2983 0-2424 
-85 -1317 -1540 -1740 -1922 -2092 -2251 2400 | -2542 
-90 -1456 -1683 -1886 -2071 -2242 -2403 ‘2553 | -2696 
-95 -1677 -1909 -2115 :2303 | -2477 -2638 “2790 don 
*99 -2136 :9374 -2584 -2774 +2949 “3111 :3262 :3404 
74 0-80 | o1182 | 01396 | O-1588 | 01765 | 0-1930 | 02084 | 0-2230 | 0-2369 
-85 -1284 -1502 -1697 -1876 -2043 | -2199 -2346 -2485 
-90 -1419 -1641 +1840 | -2022 “2190 kd = E 
| 3B—B 1231 1$ 19 48-- 
-99 -2085 -2318 +2624 | -2711 +2883 | As us 
76 080 | o1153 | 01362 | 01550 | 01724 te E Es ie 
*85 +1252 -1465 -1657 -1832 pU | : 
+1797 +1975 +2140 +2294 +2440 +2578 
80 | +1384 | -1602 2017 | -2198 | -2365 | -2521 | -2668 | -2807 
Res a | aues | aum | men] ee | ues | ume] d 
e 1684 | 0-1842 | 0-1991 | 0-2132 | 0-2266 
8 080 | onas | o19 | oisi y 1e Ib seg | gece 
-85 -1222 -1430 -1618 | areg n md di eius 
*90 -1351 -1564 -1755 | E jr ore ise RE 
*95 -1557 -1776 -1971 99 .9760 9915 -3060 | -3196 
in pee: us ec: P 0-1801 | 0-1948 | 0-2086 | 0-2218 
80 0:80 | 0-1098 | 0-1298 | 0-1479 Lg “1907 | -2055 .2195 .2327 
'85 :1193 -1397 -1581 pe -2046 -2196 :2337 +2470 
-90 -1320 -1528 -1715 d " poem wae San E 
95 | -1521 | -1785 | -1927 | -21 2702 | -2855 | -2998 | .3133 
99 +1944 -2164 +2360 +2538 
82 1610 | 0-1762 | 0-1906 | 0-2042 | 0-2172 
980 | 01073 | 01269 | 01446 | 0-1 2| -1866 | -2011 -9149 .9979 
"85 1165 -1366 -1546 por -2002 -2149 -2288 -2420 
90 | .1989 | -1494 | -1677 Eo 2215 | -2364 | -2504 | -2637 
‘95 | 1487 | -1697 | -1884 | -20 .2647 | -2797 | -2938 | -3071 
:99 -1901 -2118 -2310 -2485 
l= 
Thi | 


es of æ for which Pr (Omax. « 2) = I2(3; P, d) = P, where p = 10,—2),g = 3(v, — 2). 
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448 Upper percentage points of the generalized Beta distribution. IT 
Generalized Beta distribution (cont.) 
| | | 
" 3 4 Wh € | 7 s | 9 | 10 
Vy 
= nmm — F | 

| 7 

84 80 01048 | 01240 | O-1414 | 0-1575 | 0-1725 | 0-1866 | 0-2000 a 
85 | -1139 | 1335 | -1612 | -1675  -1827 | -1969 2105 | +2238 

-90 | -1260 | +1461 | -1641 | -1807 | -1960 | -2105 :2241 an 

-95 -1454 “1660 “1844 :2013 | -2169 2316 | -2454 | -2584 

-99 -1860 -2073 -2262 | -2434 | -2594 2742 | -2881 | 3013 

86 0:80 | 0-1025 | 01213 | 0-1384 | 0-1541 | 0-1689 , 0-1828 | 01959 0:2085 
-85 “1114 1306 | -1480 | -1640 -1789 -1929 -2062 -2189 

-90 -1233 ‘1429 | -1606 | -1769 | -1920 -2062 -2197 -2324 

+95 -1422 11625 | -1806 +1971 2125 | +2269 | -2405 | 2534 

-99 -1821 +2030 -2216 -2386 -2542 | -2689 9826 , -2956 

88 0:80 | 0-1003 | 0-188 | 0-1355 | ©1510 | 01054 | 0-1791 | 0-1920 0-2044 
“85 -1090 1279 | -1449 | -1606 -1753 | -1891 | -2021 -2146 

-90 1206 | -1399 | -1573 | -1733 1881 | +2021 | -2154 -2279 

95 | +1392 | -1591 -1769 -1932 -2083 -2225 +2359 +2486 

:99 | +1783 | -1989 “2172 | .2339 | -2493 | -2637 “2773 -2901 

90 080 | ©0981 , 0-1163 | 0-1327 | 01479 | 0-1621 | ©1755 | 01883 | 0-2004 
+85 -1067 -1252 | -1419 -1574 -1718 “1854 -1982 +2105 

“90 1181 “1371 +1541 -1698 -1844 +1982 +2112 +2236 

-95 -1363 -1558 -1733 -1893 -2042 +2182 -2314 +2440 

-99 -1747 +1949 -2129 2294 | -2446 | -2588 -2722 +2848 

92 0.80 | ©0961 | 01139 | 0-1300 | 0-1450 | 0-1589 | 0-1721 | 0-1g47 | 0-1967 
-85 -1045 +1227 -1391 -1542 -1684 +1818 “1945 -2066 
-90 | -1157 -1343 -1510 1665 | -1809 ‘1944 -2072 +2195 

-95 -1336 “1527 1699 | -1857 -2003 “2141 -2271 +2395 

-99 -1712 -1911 -2088 -2250 -2400 -2541 -2673 -2798 

94 080 | 00942 | 01116 | 0-1275 | 01421 | 0-1559 | 0-1689 | o1g12 | 0-1930 
-85 -1024 -1202 -1363 -1513 1652 “1784 +1908 +2028 

-90 1134 | -1316 -1481 -1633 “1774 -1908 -2034 -2154 

95 | -1309 +1497 -1666 31821 | -1966 2101 | .2930 +2352 

99 | -1679 11874 | -2049 2200 | -2356 -2495 -2625 | 2748 

96 0.80 | 00923 | ©1094 | 0-1250 | 01394 | 0-1529 | 01657 0-1779 | 01895 
-85 -1003 1179 -1337 -1484 +1621 +1750 “1873 “1991 

-90 -1111 -1291 1453 | -1602 “1741 +1873 +1997 +2116 
-95 -1283 -1468 -1635 -1787 -1930 -2063 -2190 :2310 

:99 “1647 1839 -2011 +2168 +2314 -2451 +2579 -2701 

98 0.80 | 0-0905 | 0-1073 | 0-1226 | 0-1368 | 0-1501 | 0-1 ; -1861 
-85 -0984 1156 1312 11456 | -1591 | Ee EE rtt 

*90 ‘1090 | -1266 1425 | -1572 1709 | :1839 |  .1962 -2079 
-95 +1259 -1441 | -1604 -1755 -1895 -2026 ‘2151 :2270 

-99 -1616 +1805 ‘1975 | +2180 +2273 +2408 +2535 -2655 

100 0-80 | 0-0887 | 0-1053 | 0-1203 | 0-1343 01474 | 0-1598 | 01716 | 0-1828 
85 | :0965 | -1134 | -1288 | -1429 | .1583 | -1688 1807 | -192l 

-90 ‘1069 | -1242 | -1399 -1543 -1679 -1806 -1927 .9043 

-95 +1235 +1414 1575 | -1723 | 1861 | -1991 -2114 -2231 

:99 | -1586 | -1772 | -1939 | -2092 | .2ə34 | -2367 | .9499 | -2611 

| 102 E | 0-0871 01033 | 0-1181 | 031318 | 0-1447 | 0-1569 01686 | 01797 
0947 1113 | -1264 | -1404 -1534 -1658 | .1776 .1888 

:90 | -1049 | :1219 | -1374 ‘1516 | -1649 | -1775 | -1894 -2008 

35 | -1212 | -1988 | -1547 | -1602 | 1829 | 19g] | 9078 | -2193 

-99 3557 | -1741 1905 | -2056 -2196 2327 «2451 -2568 


This table gives the values of z for which Pr(0,,,. <a) = 


1,(33P, 9) = P, where p = v- 2), g = 407 8 
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Generalized Beta distribution (cont.) 


| | | | | 
3 | 4 5 6 Y. 3g 8 | 9 10 
| | | 

p | | | | | 1 

104 0-80 | 0.0855 | 01014 | 0-1160 01295 | 0-1422 | 0-1542 | 01657 | 0-1766 
'85 -0930 | +1093 -1241 +1379 1508 | -1630 1746 | -1856 

:90 -1030 +1197 “1349 +1489 1620 | -1744 +1862 1974 

“95 -1190 -1363 -1519 1663 | -1797 | -1923 -2043 -2157 

-99 +1529 | 1710 | 1872 | +2021 | -2159 | 2288 | -2411 | +2527 

106 0-80 | 0-0839 | 0.0996 | 01139 | 031272 | 0-1397 | 0-1516 | 0-1629 | 0:1737 
"85 | -0913 -1074 +1220 +1355 1482 | -1602 | -1716 | -1826 

“90 ‘1011 | +1176 +1326 1464 | -1593 | -1715 1831 | -1942 
95-1169 -1340 -1493 , -1635 1767 | -1891 "2009 — -2122 

“99 ‘1503 -1681 | 184 | -1987 -2123 ‘2251 | -2372 | -2487 

108 0-80 | 0.0824 | 00979 | 0-1120 | 0-1251 | 0-1374 0.1490 | 0-1602 | 0-1708 
'85 -0897 +1055 -1198 1332 | -1457 1575 688 | .1796 

:90 | .0993 | -1156 | -1303 11439 | -1566 +1686 1801 | -1910 

- 95 | 4149 1316 | -1468 | -1607 | -1737 | -1860 11977 | -2088 
99 | 1477 1652 | -1810 | -1954 2089 | .2215 | -2334 | -2448 

110 0-80 9.0810 | 0.0962 | 0-1100 | 0-1229 | 0-1351 | 0-1466 | 0-1576 | 0-1681 
85 0881 — -1037 -1178 11309 | -1433 1649 | -1661 11767 

:90 | .o976 | 1136 | -1281 | -1415 | -1540 | -1659 | 1772 -1880 

“95 +1129 11294 | -1443 +1581 | -1709 “1830 1945 | -2055 

:99 | .1452 | -1625 | -1780 | -1923 -2056 2180 | -2298 -2410 

112 080 | 0:0796 | 0:0945 | 0-1082 | 0-1209 | 0-1329 | 0-1442 | 0-1550 | 0-1654 
‘85 -0866 11019 | -1158 | +1288 | -1409 | -1524 | -1634 +1739 

' «90 -0960 -1117 | -1259 | -1391 -1515 -1632 11744 | -1850 

-95 4110 | -1273 1419 | -1555 -1682 +1801 1915 | 2093 

z= -99 -1428 | -1598 | -1752 ‘1892 | -2023 2146 | -2263 | -2373 
ll4 i T -0930 | 0-1064 0-1189 0-1307 0:1419 0-1526 0-1628 
pe Ec — 1139 | -1267 -1387 -1500 1608 | -1712 

-90 0944 1098 |  -1239 :1369 —— -1491 1606 | -1716 | 1822 

| “95 +1091 +1252 | -1396 | -1530 1655 +1773 1885-1992 
“99 1405 .1573 | -1724 -1863 -1992 +2113 2228 | -2338 

ne ius | cose | aaa | paces | onze | ove | our | oteo | 0-1603 
“85 -0837 «09086 | -1121 1247 | -1365 :1477 | -1583 | -1686 

-90 -0928 1080 | -1219 1347 | -1468 +1581 1690 | -1794 

| | «15 11629 | -1746 1856 | -1962 

*95 “1074 1231 | -1374 1506 * as 

4 99 -1382 -1548 -1697 | -1834 “1961 | -2082 2195 | -2303 
148 go 0-0757 | 0-0900 | 0-1030 | 01152 | 01260 | 01375 | 0-1479 | 0-1579 
“85 -0824 -0970 .1103 +1227 +1343 +1454 :1559 | -1660 

| +1445 ‘1557 | -1664 -1767 

-90 0913 31063 |  -1200 11326 | 

1604 | -1719 1829 +1933 

“95 +1056 +1212 1353 | -1483 | as us disi 

jd ee | ee be Ee Nee 6247 | 0-1885 

D | H1247 | e l "1457 

720 Qs | por, | o0895 | 01014 uM END -1432 | 1536 | -1636 
“85 -0810 -0954 1086 | 1 je peri 524 1640 En 

:90 -0898 -1046 -1181 13 f m P ou T809 1908 

95 | 1040 | -1193 | +1332 | -1460 | .1903 | 2021 | 2132 | .2237 

99 | .1339 -1500 11645 | -1779 | d 

122 | 1116 | 01228 | 01334 | 0-1435 | 0-1532 
9-80 | 0.0733 | 0-0871 | 00998 0 303 | -1410 | -1513 | -1612 | 
'85 -0798 -0940 | +1069 3119» :1401 1511 +1615 ‘1715 | 

“90 0884 -1030 "1163 yo 557 | -1669 | -1775 | -1877 

“95 -1023 -1175 -1311 “1438 -1876 -1991 31010 | -2206 

. 99 | «1319 | -1478 | 1621 | -1753 |. | 
This L L L 


ti j : = LGA = P, where p = 1(v, — 2), q = à(v, — 2). 
able gives the values of x for which Pr(8 mas. <2) = L(3: P. 4) 29-2 


450 Upper percentage points of the generalized Beta distribution. II 
Generalized Beta distribution (cont.) 
| 
Ve 3 | 4 5 6 7 8 9 n 
Vy | 
| | k- 5 — 
P | | 
124 0-80 | 00721 | 00858 | 00983 | 01100 0-1209 | 01314 | O-1414 | 0:1510 
-85 | -0785 | -0925 1053 | -1172 -1283 "1390 -1491 "1588 
-90 | -0871 | -1014 | -1145 | -1267 -1381 -1489 -1592 ‘1691 
-95 | 1008 | -1157 | +1292 | -1417 -1534 -1644 -1750 ‘1851 
-99 :1299 | -1456 "1597 | -1727 +1849 +1963 -2072 +2175 
126 0:80 | 0-0710 | 00845 | 0-0968 | 0-1083 | 0-1192 | 0-1295 | 01394 | 0-1488 
85 | -0773 | -0911 | -1037 1154 | -1265 | -1369 -1470 -1566 
-90 :0857 | -0999 1128 | -1248 31361 | -1467 -1569 +1667 
95 | -0992 | -1139 | -1273 | -1396 | -1512 -1621 -1725 -1825 
99 | +1279 | -1434 | -1574 | -1702 | -1822 -1936 -2043 +2146 
128 0-80 | 0-0699 | 00832 | 0-0954 | 0-1067 | 01174 | 0:1276 | 0-1374 | 01407 
85 | -0761 | -0898 | -1022 | -1137 -1246 -1350 -1449 “1544 
90 | +0844 | .0984 | -1112 | -1230 | -1341 -1446 -1547 1644 
95 | -0978 | 1123 | -1254 | -1376 -1490 -1598 ‘1701 1799 
99 | -1261 | -1413 | -1551 -1678 -1797 +1909 -2015 +2116 
130 080 | 0-0689 | 0-0820 | 0-0940 | 0-1052 | O-1158 | 0-1258 | 0-1354 | 0-1447 
'88 | 0750 | -0884 | -1007 | .1121 | -1229 | -1331 | -1429 | -1522 
:90 | .0832 | -0970 | -1096 | -1212 -1322 -1426 +1526 -1621 
95 | +0963 | -1106 | -1236 | -1356 +1469 1576 -1678 “1775 
99 | -1242 | +1393 | -1529 | -1655 “1772 -1883 -1988 2088 
132 0.80 | 00679 | 0-0808 | 0-0926 | 01037 | 0-1141 | 0-1241 | 0-1336 | 0-1427 
85 | -0739 | -0872 | -0992 | -1105 1211 -1312 +1409 +1502 
90 | -0820 | -0956 | -1080 | -1195 | -1304 -1406 +1505 +1599 
95 | -0949 | -1091 | -1219 | -1337 1449 1554 | -1655 -1751 
99 | -1225 | 1374 | -1508 | -1632 “1748 -1857 -1961 -2060 
134 0.80 | 00669 | 0-0797 | 00913 | 0-1022 | 01125 | 0-1223 | 0-1317 | 0-1408 
85 | -0728 | -0859 | -0978 | -1090 | -1195 1294 1390 -1481 
90 | -0808 | -0942 | -1065 | -1179 1286 | -1387 -1484 +1578 
95 | -0936 | +1075 | -1202 | -1319 1429 -1533 -1633 +1728 
99 | +1207 ‘1355 | -1488 | -1610 1725 | -1833 -1936 2034 
136 080 | 0-0660 | 00785 | 0-0901 | 0-1008 | 0-1110 | 0-1207 | 91300 | 0:1389 
85 | +0718 | -0847 | -0965 | -1075 -1178 1277 371 -1462 
90 | +0797 | -0929 | -1050 | -1162 | -1268 -1369 -1465 -1557 
‘95 | -0923 | -1060 | -1185 | -1301 | -1410 | -1513 | Aen | -17065 
99 | -1191 1336 | -1467 1589 | -1702 -1809 -1910 2007 
138 080 | 00650 | 00774 | 0-0888 | 0-0904 | o-1095 | 0-191 0.1282 | 0-1371 
"85 | 0708 | -0835 | -0952 | -1060 | -1162 | .1280 -1353 1442 
-90 -0785 -0916 -1036 "1147 *1251 :1350 :1445 1536 
+95 +0910 +1046 -1169 -1284 -1391 -1493 -1590 -1683 
-99 “1174 -1318 -1448 -1568 -1680 -1785 -1886 -1982 
140 080 | œ : ; i 
2 ou 0-0764 | 0:0876 | 0-0981 | 0-1080 | o-1175 | 01286 | 0:1353 
8 | -0824 | -0039 | -1046 | 1147 | «i343 | .1335 | -1424 
"90 | 0775 | -0904 | -1022 | -1131 | 1235 -1333 1427 1517 
"96 | 0897 | -1032 | -1153 | -1267 | .1373 -1473 | -1569 | -1662 
E WR uo MBE | 3001 | -1429 | «isdn | oS | -1785 1862 | -1957 
0:80 | 0-0632 | o i ; 
Be ih 0-0753 0-0864 0-0968 0-1066 0-1160 0-1249 0-1336 
e -0813 -0926 -1032 -1132 -1227 -1318 -1406 
a te Eu 1008 | -1116 | -1218 | -isis | aap | -1497 
^ . -1018 -1138 -1250 +1355 145 641 
:1455 i d 
:99 | +1143 | -1283 | -1410 | -1527 -1637 -1740 ded -1933 
l 
This table gives the values of x for which Pr(6,, 


ax. <£) = L3: p, q) = P, where p = }(v,— 2), q = 30-2). 
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Generalized Beta distribution (cont.) 


| | 
" % | š à | š 6 7 Y s 9 10 
1 | I 
= | 
144 0-80 | 00624 | 0-0743 | 0-0853 | 0-0955 | 0-1052 | 0-1145 | 0-1233 | 0-1319 
-85 -0679 -0802 | -0914 ‘1018 | -1117 | -1211 | 1301 -1388 
‘90 | -0754 | -0880 | -0995 -1102 -1203 | -1299 -1390 1478 
95 | -0874 -1004 | -1123 -1234 -1338 | -1436 | -1530 +1620 
-99 -1128 -1267 | +1392 | -1508 1616 | -1719 -1816 -1909 
| 146 0-80 | 0-0616 | 00734 | 00842 | 00943 | 01039 | ©1130 | 01218 | 0-1302 
-85 | -0670 | -0791 | -0902 | -1005 -1103 | -1196 -1285 1371 
90 | 0744 | -0868 | -0982 -1088 1187 -1282 | -1373 -1460 
95 | -0862 -0991 -1109 -1218 | -1321 ‘1418 -1511 | -1600 
99 | -1114 -1251 | -1374 | -1489 -1596 | -1697 | -1794 +1886 
148 0-80 | ©0608 | 00724 | 0-0831 | 0-0931 | 0-1026 | 0-1116 | 0-1203 | 0-1286 
“85 -0662 -0781 -0890 -0992 -1089 1181 -1269 +1354 
j -90 -0734 -0857 -0969 -1074 1173 -1266 -1356 -1442 
| = -95 -0851 -0979 -1095 -1203 -1304 +1401 1493 | -1581 
-99 -1099 -1235 -1357 -1470 -1577 | -1677 1772 | -1864 
150 0-80 | 0.0600 | 00715 | 00820 | 00919 | 0-1013 | 01102 | ©1188 | 0-1270 
*85 | -0653 0771 | -0879 -0980 -1075 -1166 +1253 -1337 
-90 :0728 -0846 -0957 -1060 +1158 +1251 -1339 +1425 
+95 -0840 -0966 -1081 -1188 1288 | +1383 -1474 -1562 
-99 +1085 +1219 -1340 +1452 +1557 -1657 -1751 -1842 
152 0-80 | 0.0592 | 00706 | 00810 | 0-0908 | 01000 | 0-1088 | 0-1173 | 0-1255 
‘85 -0645 :0761 -0868 -0968 1062 | -1152 -1238 -1321 
-90 -0716 .0836 | -0945 | -1047 | -1144 +1235 -1323 -1408 
-95 -0829 -0954 | -1068 -1173 +1273 +1367 +1457 -1543 
-99 +1072 -1204 -1324 -1435 -1539 -1637 -1731 -1820 
154 0.80 | ooss4 | 0.0697 | 0-0800 | 00896 | 00988 | 0-1075 | 0-1159 | 0-1240 
-85 -0637 -0752 -0857 -0956 -1049 -1138 1223 +1305 
+90 -0707 -0825 -0934 +1035 +1130 +1221 -1308 +1391 
95 | -0819 -0942 -1055 -1159 +1257 | -1350 -1440 -1525 
-99 -1059 -1190 -1308 -1418 +1521 -1618 -1710 -1799 
156 0-80 0:0577 0 0688 0-0790 0:0886 0:0976 0:1062 0:1145 0:1225 
“85 -0629 -0743 -0847 -0944 -1037 +1125 -1209 -1290 
-90 0698 -0815 -0922 -1022 1116 | :1206 :1292 :1375 
+95 -0809 -0931 +1042 +1145 +1242 +1335 +1423 +1507 
4 -99 -1046 +1175 +1293 +1401 +1503 :1599 -1691 “1779 
158 0.80 | 0.0570 | 0-0680 | 0-0781 | 0-0875 |. 0-0964 | 0-1050 | 0-1132 | 0-1211 
-85 -0621 -0734 | -0837 -0933 -1024 ‘1111 +1195 +1275 
-90 -0689 -0805 -0911 -1010 -1103 | -1192 1277 :1359 
“95 0799 -0920 -1029 -1132 +1228 +1319 +1406 +1490 
-99 -1033 1161 “1277 -1385 -1485 -1581 -1672 -1759 
160 o ; .0672 | 0-0771 | 0.0865 | 00953 | 0-1038 | O-1119 | 0-1197 
Ee T Moris -0827 -0922 -1012 -1098 ‘1181 -1260 
-1090 -1178 -1262 +1343 
-90 -0681 -0796 -0900 -0998 
.1213 -1304 -1390 -1473 
“95 -0789 -0909 -1017 -1118 JU p 
-99 -1021 114g | -1262 | -1369 | -1468 | 1563 
162 0.80 | 0.0556 | 0-0664 | 00762 | 0-0854 | 0:0942 er eo Er 
85 | oeo | -on16 | -os17 | -0911 | C | cumes | oda | 1828 
90 | -0673 | -0786 | -oseo | -0986 | "1075 | ono | -1375 | -1457 
95 | 0780 | -os98 | 1008 | 1108 | "19D abd 3634 | -1720 
:99 -1009 +1134 +1248 +1353 +1452 : 
Lat ls 
= P, where p = 3(»,—2), q = $(1 — 2). 


Thi i 
his table gives the values of æ for which Pr (Pmax. < 2) = Ia(3; P» ) 


452 Upper percentage points of the generalized Beta distribution. II 
Generalized Beta distribution (cont.) 
3 & fs Lh g 7 | d 9 
X 
P | LT | 
164 0-80 | 0-0550 | 0-0656 | 0-0753 | 0-0845 | 0.093 i 
85 | -0599 | -0708 | -.0807 | -0901 0688 po | uA | 
reo | 10666 | -0777 j -0879 | -0975 | 1066 | 1181 | 1234 
"95 | 771 | -os88 | -o904 | -1003 | .i186 | .1275 | 1256 | 
"9 | 07 | 2i | 1294 | -iaag | agg | vines rer | 
166 0-80 | 0-0543 | 0-0648 | o . " i | 
‘85 | -0592 | -0699 p por | pos "met | ae 
20 | -0657 | -0768 | .0869 | .0964 osi | kon | ke | 
. 0762 | -0877 | -0982 | -1080 E 1 
" i 
:99 | -0986 | -1109 | .1220 | .1393 dum | FH “1608 | 
168: 080 | 00537 | 0-0641 | 0.0736 | 0-0825 | 0.9910 - à 
'85 | -0585 | .0691 | .0789 | -osso | Ege || ee 
90 | 06049 | -0759 | .0859 | .0953 | 1942 | 1029 | 1129 | 
95 | 0753 | 0867 | -0971 | 1068 | oiie an | M 
399 | -0975 | -1096 | .1208 | 1308 | ciaoo | 1249 | -1330 
170 080 | 00531 | 0.0633 | 0.0728 | 0.0816 | o ce | amm 
'85 | 0578 | .0684 | .0780 | -os7o | dm e Ihe 
40 | 10049 | 0761 | -oss0 | .0942 | crog | 2088 | -1118 
‘9S | 39745 | -os58 | 0961 | qose | 11472 | UE| 1194 
:99 | 0904 | -1084 | .1193 | -1294 -1389 As au 
172 080 | 00525 | 0-0626 | 0-0720 | 0.0807 0-089 m 
35 | -0572 | 0676 | ommi | -oser | opao | 70970 | 0-1046 
‘90 | -0635 | .0742 | .9840 | -0932 dom h | 1 
SE | 29599] cS | pong | toss | iisa | tee? | Tisi 
"99 | -0952 | -1072 | «1180 | 1980 | aaa | 2229 | -1301 
174 080 | 0-0519 | 0-0619 | o-0712 | 0-0798 0-0880 | mel m 
:85 | -0565 | .0668 | .0763 | .os5i os | 059 | 1035 
"e I | 0035 | -1016 | . 
? 0734 | -0831 | -0922 | -1008 A 
go | 10728 | -0839 | .o940 | 1o34 | ‘mag | “2090 | -1168 
‘99 | -0943 | -1060 | .1167 | -1266 Jaso | e| 2097 | 
176 080 | 00513 | 0-0612 | 0.0704 0.0790 | 00871 | e Bur 
‘85 | -0559 | .0661 | -0755 | 9845 | doas ius | en 
3 :0621 | -0726 | :0822 | -o912 0997 | 9 | 108 
99 | .0120 | -0830 | .0929 | 1093 | ‘imo | 1978 | -1156 
= -0932 1048 | «gues: | «1853 | isis | 222% | 274 
p 0:0507 | 00606 | 0-0696 | 0-0781 | 0.0865 | m | Lu | 
-90 | 095 | -0054 | -0746 | .0833 | opie | 77999 | O-l013 
a ‘0614 ‘0718 | -o813 | .0902 | oggy | 9994 | -1070 
-99 | spams | -0821 | .0920 | -1012 | ‘gg, | 1997 | -1144 
m 922 | .1038 | -1143 | 70240 | igg; | 7181 | 1261 
u 0-0502 0-0599 | 0-0689 0:0773 | 0-08 pend ‘1007 
90 | 47 | -0047 | -0739 | 0824 | ogas | 70929 | o-1003 
‘95 | 0007 | -0710 | -0805 | -osag | ‘ooze | 198€ | -1059 
99 | 04.5 | 0812 | .0910 | 100] | gi) | 1996 | -1132 
"DAN IE AM I ams | 189 | wa 
agg | (0497 | 0-0592 | 0.0881 | O-0785 | 0-08 ee [ Uem 
:90 | oe | -0040 | -O731 | -ogis | 0944 | 00919 | 0.9999 
‘op | 2001 | .0703 | -o706 | 0883 gos | 294 | «1048 
op | 0597 | -0803 | -o900 | sdo | (0886 | -1045 | iior 
-0903 Arg inm | š = || -1076 | +1157 ; | 
| 1215 -13 | 1235 
| | | 04 | -1390 -1471 


This table gives the values of z for which Pr (0, =P y] 2 = = 
mi p 
P, where t= 2), g = 30,2. 


10 


0-1170 
+1232 
+1313 
-1441 
-1701 


0-1157 
+1219 
+1299 
-1425 
+1683 


0-1144 
+1205 
+1285 
+1409 
-1665 


0:1132 
+1192 
+1271 
+1394 
+1647 


0-1120 
+1179 
+1257 
+1380 
-1630 


+1108 
+1167 
-1244 
-1365 
1613 
0-1096 
+1154 
+1231 
+1351 
+1597 
0-1085 
+1142 
-1218 
-1337 
-1580 


0-1074 
-1131 
+1206 
+1323 
-1565 


0-1063 
-1119 
+1194 
+1310 
-1549 


ax, <T) = I,(3; p, q) 
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Generalized Beta distribution (cont.) 


| | | 
" a g 4 | s | 6 | 7 8 | 9 10 
1 ` | I 
Ew T d i ie 
P! | | 
184 0-80 :0491 | -0587 | .0674 | -0757 | -0835 :0910 | -0982 | -1052 
-85 20535 -0633 | 0723 -0807 — -0887 -0964 | -1037 -1108 
90 | -0595 ' -0695 0788 | -0874 | -0956 1034 | -1109 | -1182 
“95 -0690 0795 | -0891 | -0981 | -1065 | -1146 +1223 +1297 
99 | -0894 +1006 1108 | -1202 | -1291 | -1376 | -1456 | -1534 
186 0-80 -0486 -0580 0667 ^ -0749 | -0826 | -0901 -0972 “1041 
E -0530 -0627 -0716 0799 | -0878 | -0954 -1027 | -1097 
90 | -0588 -0688 -0780 0865 | -0946 11024 | -1098 | -1170 
95 | -0683 :0787 *0882 -0971 -1054 “1134 | -1211 71284 
-99 | 0884 -0995 1006 -1190 -1278 1362 | -1442 | -1519 
188 0-80 | 0-0481 | 0-0574 | 0-0661 | 0-0741 | 0-0818 | 0-0892 0-0962 | 0-1031 
E -0524 -0620 0708 | -0791 0869 | -0944 | -1016 -1086 
-90 "0582 | -0681 :0772 , -0857 0937 | -1014 | -1087 | -1158 
-95 | -0676 -0779 :0873 | -0961 | -1044 | -1123 -1199 +1272 
:99 | -0875 :0085 | -1085 “1178 | -1266 +1349 +1428 +1504 
190 0-80 | 00476 | 00569 | 0-0654 | 0-0734 | 0-0810 | 0-0883 0-0953 | 0-1021 
“85 0519-0614. | -0701 -0783 -0861 -0935 -1006 -1075 
-90 -0576 -0674 -0764 -0848 -0928 -1004 -1077 +1147 
-95 -0669 0771 :0864 | -0951 -1033 -1112 -1187 +1259 
-99 -0867 0976 | -1075 | -1167 1283 | +1336 | -1414 -1490 
192 0.80 | 00471 | 0-0563 | 0-0647 | 0-0727 | 0-0802 | 0-0874 | 0-0944 0-1011 
-85 -0514 -0608 -0694 | -0775 -0852 -0926 -0997 | -1065 
E -90 | -0570 -0667 ‘0756 | -0840 | -0918 | -0994 1066 ; -1136 
-95 -0662 | -0763 -0856 -0942 | -1023 ‘1101 | -1175 -1247 
-99 -0858 -0966 -1064 ‘1155 | -1241 | -1323 | -1401 | 1476 
194 080 | 0-0467 | 0-0557 | 0-0641 | 00719 | 0-0794 | 0-0866 | 00935 | 0-1001 
-85 | -0508 -0602 0687 | 0768 | -0844 0917 | -0987 ' -1055 
-90 -0565 -0661 -0749 :0831 | +0910 | -0984 | -1056 | -1125 
-95 0655 | -0756 -0847 -0933 -1013 10900 | -1164 +1235 
-99 -0849 | -0956 -1054 -1144 | -1229 | -1310 -1388 -1462 


This table gives the values of x for which Pr(0 max, «2) = 1,(33 p, q) = P, where p = 3(v,— 2), q— 3-2) 


and notably the standard errors, of st: 
stationary normal stochastic series 

variational properties of the series th 
is not necessary to know the mean, v. 


which constitute the data are much 


techniques, including the following: 


curves (Davies & Jowett, 1956) 
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STATISTICAL ANALYSIS USING LOCAL PROPERTIES OF 
SMOOTHLY HETEROMORPHIC STOCHASTIC SERIES 


By G. H. JOWETT 
Department of Statistics, University of She[jield 


SUMMARY. A widening of the concept of stationarity leads to the concepts of smooth hetero- 4 
morphy and local homomorphy in stochastic series, obviating much of the need for the introduction 
of trend into structural specifications of statistical data. It is shown that formulae for sampling 


properties of local statistics in stationary stochastic series aro applicable as they stand to series having 
these less restricted properties. 


l. INTRODUCTION 


In a recent paper (19554) the author established the principle that the sampling properties, 


atistics constructed from local comparisons of terms 1 
were approximately deducible from the short term 
emselves. In making practical use of such statistics, it 
; variance, or serial correlations of the series, all of which 


, 


This principle is important becau 


(i) Linear (1952; Hebden & Jowett, 1952) and spatial (1955a 
(ii) Trend-reduced regression analysis (19555), 
(iii) Accuracy of serial variation statistics (1955 a) 


) systematic sampling. 


and the fitting of serial variation 


(iv) Systematic linear experimental arrangements, and the c 
means and interpenetrating sample means (1955c), 
(v) The significance of chan 


omparison of cycle phase 


ges in level between Successive stretches from the same time 
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parameters of such series are approximately invariant under translation, locally homo- 
morphic. These terms are new, and the second is proposed as being more acceptable on 
semantic grounds than the tautological term locally stationary which is suggested by current 
terminology. It is also useful to use the unqualified concept of homomorphy (contr. hetero- 
morphy) as implying effective invariance of a parameter under translation between certain 
limits only, not under all translations as in stationary series. 

These new definitions are important from the theoretical point of view; smoothly hetero- 
morphic and locally homomorphic series may be incorporated as error terms into models for 
data involvin g successively recorded elements without the embarrassing theoretical implica- 
tions (e.g. of indefinite potential continuation) implicit in assuming stationary error terms; 
furthermore, they may be used in many circumstances where the assumption of strict 
invariance under translation would clearly not be justified. Their practical importance lies 
mainly in permitting the extended use of formulae already established for stationary series 
in (i)-(v), an extended use for which the formulae require little if any modification. Hence, 
when the occasion for practical use of these formulae arises, the question of whether the 
Series involved are strictly stationary often need not even be considered. 

In establishing this as a general principle, the exposition which forms the nucleus of this 
Paper has necessarily become rather generalized and abstract; yet since the way in which the 
Principle works is fundamentally simple, it will be demonstrated first in relation to a par- 
ticular formula, taken from (iv) and used for a similar purpose in (19554). 


2. DEVELOPMENT OF IDEAS IN RELATION TO A SPECIFIC EXAMPLE 


Suppose x(t) to be a random function of t, such as a statistical time series variate for con- 
tinuous time f. For any particular value t, of t, z(£,) will be taken to have a probability 
distribution with mean x(t) and standard deviation o(¢,); in general these parameters will 
Dot be constant for all values of t, For any pair of values t, tẹ, the covariance cov [x(t.), x(t,)] 
Will be a function of both t, and tp, not merely, as in stationary series, of their difference, and 


Will be related to the serial variation function 


(t, tp) = E[3(x(.) — X(t ,))?), (1) 
Which ig more useful than the covariance in practical work, by the formula 
Blas ty) = 3) — ulig)? + 30? (ta) 9-30 (tp) — cov [(t,), 2(t,)]. (2) 


A simple statistic U, which might be used to test the significance of the difference between 
two Phase means, at interval Ow, (0 « 0 « 1), of a suspected cycle of period w, is given by the 
ormula 
nU = [x(T) — (T +0w)]+ [e(T 4-w) — (T + w+ 0w)] +... 
+[e(T+n—lw)-x(T+n— l1w+0w)]. (3) 


The Sampling variance of U is given by 


n-1 T + fw) —a(T + Bw 4- 0w)]), (4) 


n? vap Fis X cov {[2(P +aw) —a(T + w+ 0v)], [a( 
a,f=0 


la Ej fis [x (IP + o), (T + Bw)] + eov [2(T +aw+ Ow), (T - Bw + 6w)] 
a,B=0 


~cov[x(7' + aw), (T + Bw + 0w)] — cov [2(T+ aw + Ow), (T fw], (5) 
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obtained by averaging semi-squared differences, can reasonably be used as a basis ule i 
mating 6* (7); this function may be fitted to them as for stationary series, since they too wie 
sampling properties which may be shown to be robust under smoothly aces m 
departures from stationarity; furthermore, if x(t) is locally homomorphic, they wi "d 
real descriptive meaning for the smaller values of r. Tests and measures of heteromorp y 
would not be difficult to devise, but for the practising statistician they would fill le 
a rather subsidiary need, like tests of normality in classical statistics; he would usually : e 
content to assume smooth heteromorphy in the absence of conspicuous evidence against it. 
Again, the concept of smooth heteromorphy reduces the need to incorporate the somewhat 
indeterminate notion of trend into specifications for data, particularly when such trend 
would not be evolutionary in character, but merely a smooth slow movement with unspeci- 
fied variational properties. It is not usually possible to distinguish such a trend from the 
longer term aspects of serial correlation in observed data; fitting it is apt to be an arbitrary 
procedure, and in analysis parameters have to be introduced to specify it. Accordingly, is 18 
useful to be able to dispense with it, in this local type of analysis at least; in the specifications 
required it is usually possible to replace trend plus stationary component by a single 
smoothly heteromorphic component. 
The concepts of smooth heteromorphy and local homomorphy are readily generalized to 


concomitant variables, spatial series, and parameters of higher order (leading to the concept 
of local normality); such a general treatment is given in the next section. 


3. GENERAL THEORY 


For simplicity of notation and exposition, the general theory will be developed only for the 
case of a univariate stochastic series a(t). The ideas and proofs, however, extend readily tO 
the case of a vector random function x(t) with components x(t), z(t), ... defined at points t 
of a multidimensional space S (cf. Jowett (1955a)), there being no concept which is nob 
obvious that occurs in passing to a vector function and to a higher dimension. 


The means x(t), standard deviations a(t), covariances cov (x(t,), x(t,)) and serial variation 
functions ó(t,, 15), are all instances of the general concept of probability parameter function. 


A(ty to, ...) defining any specified parameter of the multivariate probability distribution of 
(L5), z(t), ... as a function of ti, lo, ...; this concept also covers higher order cumulants. 

Definition of smooth heteromorphy: The stochastic series x(t) is smoothly heteromorphic in the 
probability parameter function X(t, 


; tas «+. tm) of m( > 2) phase values tisto s t in the interval 
if, for all ty, tz, ..., t, in R, there exist a function A*(t, — ty, t 


j 2— tə ...) of separation only such th uat 
A—A* and its first and second derivatives with respect to ty, t, ... are absolutely bounded in E by 
constants Myo, My, Myo, respectively. The smoothness of the heteromorphy is characterized by the 
smallness of Myı, My, which can be achieved by suitable choice 


of À*. 
For functions of a single phase value, A* will be taken as a constant. 
If we write 


Gh, tos...) = Alts ty, ...) -AR =i 
this definition implies that if a is a constant having the dimensions of t and such that 
|71|,|72|,-.. <a, and if the interval (6. tas -—.), (tr +74, tS ao .-.)] belongs to R, A admits ° 
a representation in the form Í 


A(fs +T ta + Ta ...) = A*t 4-7, — 1 — Ta «Ja 


2 fa — ls, ...), um 


9 9 
Lengp enge.) &(5, ts, ...) 


+ ám?a?0 (6 rts em, s) Mass QD 


i" el 
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where | )(t, +74, te 4-75, ...) | < 1, & < Myo, 02/01, ... < My. The function A* will be called an 
acting parameter function; it need not itself possess bounded second derivatives everywhere 
in R. 

If x(t) is smoothly heteromorphic in all cumulants of the rth and lower orders, it will be 
described as smoothly heteromorphic to the rth order. In many applications, smooth hetero- 
morphy to the second order is all that is necessary, and this can easily be shown to imply 
smooth heteromorphy in the serial variation function d(¢,, ts), leading to the concept of an 
acting serial variation function à*(r), which in practice will usually be found to have a cusp 
atr=0. 

We shall be concerned with intervals of t of width 2a (later to be equated to twice the 
constant a mentioned above), and with seminvariant linear local functions (s.1.1.f.'s) L,(t) of 
diameter 2a defined by Stieltjes integrals as follows: 


L0 [7 «tat. (12) 


a 
+4 
where | dl,(r)=0 (all A >a), (13) 
-A 


justifying the use of the term seminvariant. With any s.l.l.f. is associated its absolute 
coefficient sum 


+a 
H, = | | dl,(r) | | (4) 
-a 
and first absolute coefficient moment 
1 fte 
K= z UT aso]. (15) 


THEOREM 1. If, for [ty — 4s tg — a. ...), (+A fg +a, ...)] € R, the heteromorphy of the series x(t) 
in the probability parameter function A(l, ts, ..., tm) is sufficiently smooth and if D, (ty), Dp(ty), ... 
are m localized seminvariant linear local functions of sufficiently small maximum diameter 2a 


ving finite absolute coefficient sums Hy, Hp, ..., then 


| M | Att tg lr) apts)» 


= Í ta [V aste by — Tay A i 


-a -a 


«e, (16) 


Where ¢ is a preassigned small positive quantity. . 
Proof, "in the sini nd side of (11) is substituted in the left-hand integral in (16), the 


Seminvariant property of the Z/s ensures that all terms not involving all of 7,, Tə, ... yield 


tegrals which are identically zero. 


jj i: dJm2a20 (4-74, la. Ta ---) Mi di, (r;) dlj(rs) --. E gma? Myo Ha Hp... (17) 
=a 


—a 


and the quantity on the right-hand side of ( 17) will be less than ¢ if I, and a are small 


Phough, The th foll 

. eorem follows. P eM : 

n Theorem 1 and subsequent theorems we are not in fact thinking of i EM z 
Way aga, quantity ultimately to be made vanishingly small, but as a quantity which must be 
acceptably ER for the theorems to be applied, its size being governed jointly by the 


Smoothness ofthe heteromorphy of a given series and the smallness of diameter of the s.1.1.f.’s 
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involved in the statistical analysis which it is proposed to apply to it. Sometimes, as for 
example in Jump analysis (Jowett (1955d) and Jowett & Wright (in preparation)), it is the 
diameter of the s.l.1.f.’s which has to be made sufficiently small to make the theorems apply 


in the case of a given series, and thus make statistical analysis relevant to some issue possible 
at all. 


THEOREM 2. If, for [(t;—a,t,—<, ...), (t +a, to a, ...)] € R, the heteromorphy of the series 
x(t) is sufficiently smooth to the second order, the sampling covariance of any pair of s.LL.f.’s L(t), 
L,(ty) localized at points t,, t, of R and having finite absolute coefficient sums H,, Hy and first 
moments K,, Kẹ may be made to satisfy an inequality of the form 


cov alt) 2569] [7 [^ -on en dt ndn (18) 


where ó*(r) is an acting serial variation 
diameters of the two s.l.. fs, and e is a 
Proof. 


cov I). L9] =[" [^ eovtatt eri) init) dry) (19) 


function of separation only, a is the greater of the semi- 
preassigned small positive quantit y. 


=f a {207 (ty +74) + 207 (ta +72) - (t +74, ta t7) 
; + Mt +73) — Alta + T3)]?} dl, (r,) dl;(r), (20) 
d [rcnt Mnt 7) nr) dL d (21) 


the integrals of the terms involving 7, or T only vanishing because of the seminvariant 
property of the s.l.1.f.'s. 

In (21) we may ex 
of the terms in y(t 
property, 


+a (a 
cov [L,(t,), Ly(ts)] -f ji (-9*(& 7, — fa — 73) — 2a? Myo Olti +74, 


pand the square. Because of the seminvariant property, the integrals 
1+7,) and y(t.+7,) vanish. Hence, using (11) and the seminvariant 


to + To)} dl, (7,) dly(rs) 
+a Out tT ) 
=i, frn dt, Su 3M, + 7)] dl,(r,) 


ai 
<f “(np era 
-a OL, 


*ia*M,, 0 (t, + 7) dir, (22) 
It follows that 


+a [ca 
cov [L,(t,), Ly(t5)] zii [ie —Ó*(t +T —t,— To) dl, (r.) dl;(r;) 


S20? Ms H, Hy + (2a, +. eM. H,) (24K, + 1a2M, H,). (23) 
If Myo, M, and a are sufficiently small, the right-hand side of (23) will be less than e. This 
proves the theorem, 
Definition of acting normality. 
mality in the interval R if it is s 
higher than the second which may 


The stochastic series X(t) possesses the property of acting nor- 


moothly heteromorphic in R with acting cumulants of order 
be taken as zero. 


G. H. Jowrerr 461 


THEOREM 3. If in an interval R the heteromorphy of x(t) is sufficiently smooth to order s, and 
Vf x(t) has the property of acting normality, then for any set of s s.L1. fs with bounded absolute 
coefficient sums and first moments, with diameters not greater than 2a, and involving only phase 
values lying in R, 


EG E. Lar O 
Blt) Lats) 1,46] = [m Cnet 25 5 wg (24) 
ita +a 
where 6G. =| | OF (Eg -- T, —t,—7,) dl, (Tq) dl, (r,), (25) 


and the summation is taken over all partitions of the suffixes 1,2, ..., s into different pairs. 
Proof. Let x(tyt,...£,) be the cumulant of highest order associated with the product 
T(t) x(t,)...a(t.), repetition of arguments being permissible. Thus, for example 
K(t,t+7) = cov (x(t), x(t +7)), k(t, t) S o*(t). 
From the relationship between the moments and cumulants of a multivariate distribution, 
it may easily be shown that 
E(x(é,) x(ty) ...2(t,)) = Uk(ty, ...) k(...) ...&(...) (26) 


where the summation is of all products of cumulants obtainable as follows: the symbols are 
partitioned into one or more subsets; the symbols of each subset are taken as the arguments 
of a cumulant; and all the cumulants resulting from the particular partitioning are multi- 
plied together. Thus, for example, 
Elet) a(t) (typ) = K (tgs bys tiw) + K (by) Ks tye) + (us be) Ka) + K (ys tu) Ky) + K (ty) K (tp) (Lu). 
(27) 
The constitution of the summation is unaffected by coincidences in the actual values 
toto... t. Now 


FULL, (ty) L, (t) ... La (4)] 


s | +e [T1 Biet, +r) ata) +++ elts 71 Ag (r3) «++ dla rs), (28) 
- [7 [ton vee) K( Ty; --)-] dh (Ty) ++ dL (Ts) (29) 


=> [7 cene t [ET [tnm atn]. (30) 


the summations in (28), (29), (30) being defined as in (26). If the heteromorphy of x(t) is 
Sufficiently smooth, and a sufficiently small, every integral in (30) may be replaced, to the 
order of a given e, by that of an acting cumulant which is a function of separation only. 


Furthermore if x(t) has acting normality, only the acting cumulants of the second order 
Survive, Tenge if we neglect terms of order e, the only terms of (34) which make a contribu- 
tion are those consisting entirely of products of integrals of second order cumulants. Now 


tYpically 


W ú K(lgtTq 1,4 7,) dr, dr, mid L, (f). Delt) 
-a E l —8* lta + Ta — be Tr) dL, (rs) dL, Tr) + 0(6) (32) 


(31) 
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Theorem 2. The proof of the theorem then follows at once. Formula (32) differs from p 
h i ding formula for stationary series, established in (19554), only in the presence o 
iheteercDM asd the use of an acting, instead of actual, serial variation function of separa- 
m Accordingly, if à*(r) has sufficient tendency towards linearity with enini y 
the methods of that paper may be applied and lead to the following EE w ee 
a suitably modified form of the theorem established in § 2 of that paper, and which is pr 
in the same way: 


THEOREM 4. The expectation of any product of powers of s.L.l. Js of maximum diameter 2a 
of values in an interval R of a stochastic series x(t) having, in R, 

(a) heteromorphy which is sufficiently smooth, 

b) acting normality, . 

d an fet serial variation function which has a sufficiently rapid tendency to linearity 48 
the separation |7 | increases beyond a, . =" 
may be expressed as a sum of terms which are either of magnitude at most of specified order € 1 
involve only values of the acting serial variation function in a neighbourhood of |r | = 0 having 
magnitude of order a, i.e. depending on local variational properties of x(t). " 

Many statistics which are of practical interest are means, or functions of means of product À 
of powers of n s.l.l.f's which are evenly spread over a region of S and their asymptoti 
sampling moments depend ultimately on the n? covariances of the s.1.1.f.’s involved in pes 
Because of the even spread, the number of these which are not far apart (a term -— 
precisely in (1955q)) is of magnitude O(n). If x(t) has the properties specified in Theorem Aa 
is only the covariances of these which are not negligible, and these may be expressed in 
terms of what are essentially local variational properties, as indicated in Theorem 4: 
Following the arguments of (1955a) we may assert the principle that for smoothly 
heteromorphic series the sampling properties of local statistics depend essentially on local 
properties; in general the same sampling formulae may be used as in stationary series, with the 
acting serial variation function playing the role of the serial variation function in stationary 
series. 

It will be observed that none of the theorems explicitly involve the concept of local 
homomorphy, which was mentioned in § 2; it is not necessary for their application that A 
should approximate to A in any sense. Nevertheless, local homomorphy is a useful property 
for a series to have; itis consistent with smooth heteromorphy, and implies the possibility of 
such an approximation, namely that A* may be chosen to approximate to A when the separa- 


tions of the arguments (t4, ty, -..) involved are of order not greater than a. Accordingly it 19 
worthy of formal definition, w 


hich is conveniently given in terms of the corresponding 
limiting property. 


Definition of limiting local homomo 
the limit in the interval R i 
whenever [t,, ty, of] lies in R and can 
probability parameter function r*(t, 


a t 
vagnitude a, depending on c, such tha 
be included in some interval of width 2a, an acting 
— ta, -.-) can be found which is such that 


| Alta, tay s.) A5 tay...) | ee, (33) 
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For some probability parameter functions, smooth heteromorphy is sufficient for limiting 
local homomorphy. An important instance is the serial variation function ô(ti t2). The 
definition of smooth heteromorphy for [^, t] c R permits us to choose ô* so that à*(0) = 0. 


Writing Pty, ty) =O (ty, te) — 0* (t, — tə), (34) 


from the boundedness of the second derivatives of 9 it follows that for some t1, t, lying in the 


int 
interval t,, t, "T -py &)- He t) (35) 
1 t2) = (ty 8t, 1» "2. 2 EN 1:"2/» 


Where £ = 4(t, 4-1). I£ | t4 —t, | «2a, where a = ¢/2M,,, 

| 95,19) | < 2aM;, = e, (36) 
thus establishing limiting local homomorphy in à. On the other hand, consideration of the 
D A(hs tes ta) = 5 tata) lt) (37) 


Shows that smooth heteromorphy is not always sufficient to secure limiting local 
homomorphy. 


REFERENCES 


Box, G, m, p, (1953). Non-normality and tests on variances. Biometrika, 40, 318. 
Davs, H. M. & Jowerr, G. H. (1956). The fitting of locally Markoff serial variation curves. J. R. 
Statist. Soc. B (in the Press). , : - 
EBDEN, Jk acd G. H. (1952). The accuracy of sampling coal. Appl. Statist. 1, 179. : 
OWETT, G. H. (1952). The accuracy of systematic sampling from conveyor belts. Appl. Statist, 1, 50. 
Jownry, G H (19554). Sampling properties of local statistics in stationary stochastic series. 


Biometri 0. ` : B 
Ji OWETT G koe mes Least squares regression analysis for trend-reduced time series. J. R. Statist. 
» G. H. i 


Soc. B, 17, 91 n = ; 
Jownnn LH. (19 arison of means of industrial time series. Appl. Statist. 4, 32. 
Wk, E à = : ema Bangla SERIE of variance problems for time series, with industrial applica- 
tions. Manchester Statistical Society, Group Meetings sae in 
Jownrr, G. H. & Wricut, W. M. (in preparation). Jump analysis. 


Biom. 44 
30 


[ 464 ] 


DEPENDENCE OF THE FIDUCIAL ARGUMENT 
ON THE SAMPLING RULE} 


By F. J. ANSCOMBE 


Princeton University 


Suppose that we have a series of observations, given to have been drawn independently 
from a common chance distribution of specified form depending on a single unknown para 
meter 0. Suppose the observations have been taken according to some sampling rule such 
that the eventual number of observations does not depend on 0 except possibly through 
the observations themselves. All recognized kinds of sequential rule, including fixed- 
sample-size rules, satisfy this condition. (A type of sampling rule to be excluded would be 


one where the number of observations depended on other observations subsequently 
suppressed.) 


It is well known that the likelihood function of 9, given the observations, is independent 
of the sampling rule. For in any expression for the chance of observing what has actually 
been observed, the sampling rule only enters as a factor independent of 0, which is therefore 
irrelevant when the chances corresponding to different values of 0 are compared. It follows 
that the posterior probability distribution for 0 derived by Bayes's theorem from some 


given prior distribution is also independent of the sampling rule. (I shall refer to this use 
of Bayes's theorem as ‘the Bayesian argument’, for short.) 


statistical procedures; for exam 
The purpose of this note is t 


the sampling rule, 


Secondly, Lindley (1957) has shown that, if the common chance distribution of the 
observations referred to in the openi 


¢ nee above is a member of the exponent? 
family, then the fiducial argument does not hay i 


iy 
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sampling rule. Tt is also pointed out that the same applies (in a more striking degree) to 
decision procedures chosen by the minimax rule, when these are expressed in Bayesian 
terms. 

Thus, Fisher’s hierarchy of inferential methods, namely (in ascending order of informa- 
tiveness), (i) significance tests, (ii) contemplation of the likelihood function, (iii) the fiducial 
argument, shows a curious oscillation between heeding and ignoring the sampling rule. 
Items (i) and (iii) require for their definition that the whole sample space be defined; and 
this property is shared with Neyman’s confidence intervals, the sampling distribution of a 
statistic, and Wald’s minimax decision procedures. But for item (ii), as for the classic use 
of Bayes’s theorem, we require to know no more of sample space than the single observed 
point. It seems to me that, before Fisher’s account of statistical reasoning can be accepted 
as correct, this oscillation must be explained and made acceptable. More generally, anyone 
who claims correct understanding of statistical inference needs to explain when, why, and 
how, knowledge of the sampling rule is relevant to the interpretation of given observations. 


Examples showing dependence of the fiducial argument on the sampling rule. In constructing 
such examples, the difficulty is encountered that when the sample size is a random variable 
its distribution, being discrete, cannot be made the basis for an exact fiducial argument, 
though for large sample sizes an approximate fiducial argument may be possible. In order 
to avoid this kind of imprecision, the examples below refer, not to ordinary sampling with 
a finite number of observations, but to the sampling of a stochastic process with continuous 
time parameter. Large samples from (respectively) binomial and normal populations will 


exhibit approximately the same phenomena. 
Example 1. Let r(t) denote a Poisson process with continuous time parameter ¢ and 
Independent increments, such that hid) nt 


has a Poisson distribution with mean 07, for any positive 7 and any f. Thus the jumps in 
To(t) occur at mean rate 0 per unit time. We suppose f to be positive but otherwise unknown. 
Starting with r,(0) = 0, let a realization of the process be observed continuously for a period, 
at the end of which t has the value 7’ and 7,(¢) has the value R, say. The likelihood function 


9f 0, given the observations, depends only on the end-point (T', R), being} 
OR c-9T, (1) 


Suppose we are told that the duration of observation, T, was fixed in advance. Then R 
is the observed value of a chance variable (for fixed 0) having the Poisson distribution with 
Mean ØP, No exact fiducial distribution for 0, given R, can be found, but if R is large we 
can use an approximate fiducial argument (Fisher, 1956, pp. 62-3), asymptotically equi- 


Valent to the use of Bayes's theorem with likelihood (1) and prior distribution} 

a (2) 

Ny 

A asi ’, R was fixed in advance, 
q PPose, alternatively, wanra bli hatepe dm ion r(t) = R was satisfied 
^ Observation stopped at the first instant when the condi 9 j 
The likelihood function is a set of odds for 0, and is therefore arbitrary up to a multiplying factor 
isi un 

Sende ^ lout when Bayes's theorem 
rior cro AM need not be normalized, since constant factors cancel out when Bay: 
d. 


CT 
Ind 


18 use, 
30-2 
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Then T is the observed value of a chance variable (for fixed 0) having the distribution 


er peieorap, ` (3) 

RI) Thig " 
This inverts (see Fisher, 1956, p. 53) to give a fiducial distribution for 0, given T, namely 
TR R-16-0T dq (4) 

—m GR 0 

enw € 


which is the same as the posterior Bayes distribution for 0 derived from (1) with the prior 
distribution dà (8) 
O° 
Example 2. This is similar to Example 1 except that now both interpretations of the 
observations lead to proper fiducial distributions. Let x(t) denote a normal (Wiener) 
process with continuous time parameter t and independent increments, such that 


tolt +T) — volt) 
is normally distributed with mean 67 and vari 
%9(t) increases at the rate 0 per unit time in mean 
0 to be unknown. Starting with X_(0) 


ance 7, for any positive r and any t. Thus 
and 1 per unit timein variance. We suppose 
= 0, let a realization of the process be observed con- 
tinuously for a period, at the end of which t has the value T' and x(t) has the value X, say- 
We shall suppose that X is positive. The likelihood function of 0, given the observations, 
depends only on the end-point (T, X), being 


Axer, (6) 
Suppose we are told that the duration of 
the observed value of a chance variable ( 


and variance T'. This inverts to 
distribution 


observation, 7’, was fixed in advance, Then X is 
for fixed 0) normally distributed with mean 0T' 
give a fiducial distribution for 0, given X, namely the normal 


if (=) e-Mr0-ximy qg. (7) 


Which is the same as the posterior Bayes distribution for 0 derived from (6) with the uniform 
prior distribution dà (8) 

Suppose, alternatively, we are told t 
at the first instant when the condition X(t) = 


>0, the sampling rule 
t when either x(t) = ¢ 0T 


= k, where cand kare given positive numbers. If (T, X) denotes the end-point of the sample 


path, either X = c or T = k. Let T* — T-Lc— X, 

so that T* = Tif X = cand T* = k+c-XifT =, Then clearly T** is a sufficient statisti 
for 0. It is shown below (next paragraph) that the chance distribution for T". given 0, has 
the monotonicity property needed for the derivation of a, fiducial distribution for 0, give? 
T* (Fisher, 1956, p. 69). Note that for any sample path such that T <k, the value of k i$ 
not needed in the fiducial argument, whic. : 


h is based on the cumulative distribution functio? 
of T*, given 0, at the observed value of 7’. 


———————ÁÀ 


AN 
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To verify the monotonicity property of the distribution of 7, let £;(x) denote the chance 
that 7 <a. Then F(x) is the chance-measure of all sample paths in the interval 0<i<k 
for which 7* <a. Let each of these paths be deformed from X(t) to a(t) + at, where æ is 
a positive constant. All the deformed paths satisfy 7* < x, and if 0 is changed to 0 +æ their 
measure is still (x). In general there are also other paths, besides these deformed paths, 
for which T* <x. Tt follows that F5, (x) > K(x), ie. F(x) is an increasing function of 0 as 
well as of x. For given positive x, F,(x) runs from 0 to 1 as 0 runs from — co to 4- co; 
for given 0, it runs from 0 to 1 as x runs from 0 to co. 

Suppose, then, we are told that the given sample path was observed under a sampling 
rule of the sort just described, and that 2,(¢t) attained its prescribed upper bound (therefore 
equal to X) before the truncation provision came into effect. The chance distribution for 
T', given 0 and X, is (see Feller, 1950, p. 296) 

X oqp-be-ATO- xime 

je e IY aT. (9) 
(The integral of this from 0 to the truncation time % is the chance that the process attains 
the bound X before truncation.) The fiducial distribution for 6, given 7’, derived from (9) 
isnota posterior Bayes distribution for any prior distribution, because the condition given 
by Lindley is not satisfied (namely, that there exist transformations of T to U and of 0 to dg, 
Such that ¢ is a location parameter for U). However, we shall see that if X is large the 
fiducial distribution is approximately a posterior Bayes distribution.+ 

If we integrate (9) with respect to 7 and then take the differential with respect to 0, we 
Obtain the fiducial distribution for 0 in the form 


T X ane 
—— (X — 0t) e-HO-X0 dt, 
af o (27) ( 


After changing the variable of integration from ¢ to 4t(0+X/t), we obtain the explicit 
expression 2X ex^ @(—V) db, Pini 


t 
Where V= JT0+XT) and o - [^ Aden). 


We nowrestrict attention to valuesof 0 near X/T, according to the asymptotic specification: 
x >, T'—co, with X/T' constant, JT |0— X/T | bounded. On using the asymptotic 
*Xpansion for ®(— V) as V —oo, we see that (10) is 


1 2X -ro-ze 1 «o(z)]a*. 


Jem V ° 
It is easy to show that y? aid T ) 
iy = 9+ 7): 


and so finally we have the fiducial distribution for ô in the form 


iz ) e-tro-xin 1 +0(5) ae. (11) 


ipi i Dr H. E. Daniels. 
+ The:followin g proof, shorter than my original one, is due to Dr aniels. 
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This is asymptotically the posterior Bayes distribution for 0 derived from (6) with the prior 

distribution dé (12) 
FIL a 

(This has been established for values of 0 near X/T. The insertion of the modulus sign 


the denominator extends the result to include what we should have obtained if we had 
supposed X to be negative.) 


Remarks. (1) In Example 1, if we had not been given the complete observational ue 
but only the end-point (7', R), to be told that the sampling rule was the inverse one ker 
give us an extra piece of information about the sample path, namely, that the ord iem 
jumped from R—1 to R at the final instant T. This extra information is reflected a 2 
fiducial argument (when we translate it into Bayesian terms) by the substitution o: ig 
prior distribution (2) by (5), which has the effect of attaching more weight to lower va : : 
of 0. Thus, the fiducial argument does not use the observations only in the form of the like 
lihood function, the ‘sufficient’ statistics (7, R) are not indeed sufficient for it; further 
information about the observations, not contained in the likelihood function, is required. T 
This, of course, is in contrast with the classic use of Bayes's theorem, where the prior dis- 


tribution, however it may be determined, is wholly independent of the observations used 
to calculate the likelihood function. (The same remark, with obvious changes, applies 
equally to Example 2.) 


This point applies to other kinds of statistical argument besides the fiducial; for instance, 
to minimax decision procedures. From the mere fact that a procedure can be expressed p 
terms of Bayes’s theorem with some specially-chosen prior distribution, we cannot infer 
that the procedure ignores the sampling rule, because the ‘prior’ distribution may depend 


on the sampling rule (and so not be prior at all). This is illustrated by the minimax pro 
cedure for estimating a binomial chance 0 from a sample of fixed size n, when the loss LIS 
quoting the estimate Ó is proportional to (Ó — 0y*. The minimax is what would be obtained 
by the Bayesian argument if the prior distribution for 0 were 


[£1 — 0)]jv»-1q6, 
which depends on n. (See Hodges & Lehmann, 1950.) 

(2) Another way of regarding the examples is as problems in recognizing ancillary 
statistics. In Example 1, should we think of T' as ancillary to R, or of R as ancillary 
to T? 

(3) It is of some interest to inquire h 
of their kind. In both cases, other s 


wn parameter (), (i 
with chance equa] 


likelihood function has been termed ‘sufficient’ by Fisher. ndi 
definition carries the innuendo that the likelihood function is all 


we need to know. 
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the fiducial argument. Condition (ii) is not satisfied unless the process is either continuous 
(and therefore normal) or purely discrete, all jumps having the same magnitude (as with the 
Poisson process). A process of this latter sort, having negative as well as positive jumps, is 
the difference of two independent Poisson processes. There are now two parameters. 


However we select one parameter to be estimated, the process cannot (I think) be made 
to satisfy (i). 
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TESTS FOR RANK CORRELATION COEFFICIENTS. I 


By E. C. FIELLER, H. O. HARTLEY anp E. S. PEARSON 


Statistical Advisory Unit, Ministry of Supply, London; 
Iowa State College, Ames; University College, London 


1. PURPOSE OF THE STUDY 


(1:1) The measures considered 


The following is a first report on an investigation which became possible with the avail- 
ability of the 25,000 sets of correlated random normal deviates, 3000 of which were pub- 
lished in Fieller, Lewis & Pearson’s (1955) T'racts for Computers, no. XXVI. The object 
which we set ourselves was to study with the aid of these data the sampling distributions of, 
and relationships between, three measures of rank correlation, in the case where the basic 
variables which have been ranked follow bivariate normal distributions. ; 
We shall use the following notation. Suppose that there are n pairsof associated rankings 


Aa... Un ANd v, vs, oeiy Ups 


where the integers uw; (i = 1,2,...,2) may be taken in ascending order 1, 2,..., and the 
v; are a permutation of these integers. We shall consider in the present paper the two 
following measures of correlation between these rankings: 

(a) Spearman's coefficient which we denote by rs. This is simply the product moment 
correlation coefficient of w,, v; and may be computed from the sum of squared differences 


Ss = X (wu; — vy, (1) 
[ESI 


where Tg = 1— 6Ss[(n? — n). (2) 

(b) Kendall's coefficient, r, which we denote by rx. 
For every integer u; count the number of v; with v; » 
obtain the positive score Pj. Then 


This may be computed as follows: 
w; and j >i; then add these counts tO 


Ty = APy[(n? —n) — 1. (3) 
Both rg and ry lie between +1 and —1. We shall not be concerned here with ties among 
the w’s or vs. 
The following is a third coefficient which 
and which we hope to consider later: 
(c) The Fisher-Yates coefficient. Let £(i 
expected value of the ith largest standardi. 


has been computed for all the sampling data 


moment correlation coefficient of thes? 
scores, namely " 
z n 
Tg PEJO solm DEl |n). (4) 
= ici 
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Convenient tables of the individual £(i | n) as well as of Dei | n) are given, for example, 


in Fisher & Yates (1938, Tables XX and XXI). As an approximation to the actual product- 
moment correlation coefficient, r,,, in a normal sample rp clearly has much to recommend 
it; but the only discussions of this coefficient of which we are aware are those by Jeffreys 


(1948, pp. 209-10) and Hoeffding (1951, pp. 86-9). 
(1:2) Some known results on the distribution theory of rs and ry 


For a comprehensive summary of the older results, the reader may consult Kendall 
(1948) and Moran (1950). Briefly these are as follows: 

For independent random rankings (i.e. for random permutations of the v;) the complete 
distributions of rg and rg have been obtained for small n by combinatorial enumeration. 
Adequate approximations have been evolved for larger n. 

In the case of correlated rankings it is first necessary to specify the nature of the depend- 
ence. A discussion of this problem of appropriate population models was given by Daniels 
(1950), and very recently Mallows (1957) has developed a new form of approach related to 
paired-comparison theory. In the present paper we start from the assumption that the 
^ pairs of rankings w,, v; have arisen as the rank numbers in a sample of n pairs of correlated 
normal variates. Thus, if x; y; (i = 1, 2, ..., n) denote a random sample of n paired obser- 
vations from a bivariate normal population having correlation coefficient p, we suppose 
that the x; are arranged in order of magnitude and that v; is the rank of y;. This model 
has received considerable attention and a certain number of theoretical results are known. 


Thus we have 


e = 3 in7 p + (n — 2) sint Mor 

S) = ——— n —2)sin oran, 1948), 5 
(rg) ETT p+( ) ap} ( ) (5) 
Var (rs) = 1/n{1 — 1-563465p? + 0:304743p* + 0-155286p9 + 0-061552p8 + 0-022099p" + ...}. 


(6) 
Equation (6) is a large sample approximation due to Kendall (1949) and David, Kendall & 
Stuart (1951). As we shall see below, it does not appear to be very accurate when the 
Sample size is as small as 10. Turning to 7x, we have 


(rg) = sin? p (Greiner, 1909), (7) 


var (rg) = T E T E sin-tp) 20-2 G- (sin 1p) Jl (Esscher, 1924). (8) 


As far as we are aware, no results are available for the higher moments or cumulants of 
"s Or 7... but Sundrum (1953) showed how the third and fourth moments of Tk might be 
Obtained in the general case. He also used some random sampling results to give empirical 
Values for these moments, assuming underlying normal correlation, inthe single casep = 1/,/2. 

As can be seen from equations (6) and (8), the standard deviations of rg and ry change 
With p, Further, as might be anticipated from the parallel case of the product moment 
Correlation coefficient 7,,, the shape of the sampling distributions are found to change 
p. Thus, when we get away from the problem of using rank poss ae in 
of independence, we at once run into difficulties. The lack of results for dependent 
ings has made it difficult to compare the relative merits of different rank coefficients 


in detecting dependence, nor has it been possible to use these coefficients for a comparison 
9f correlation di different populations. If we accept the underlying bivariate normal struc- 


tests 
Ta; 
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i i we 
ture, then we are faced with the distributional problem; if we do not accept this, then 
have also to look for a simple definition of non-parametric dependence. 


(1:3) The present results and their bearing on these difficulties 


: A e 

While we do not claim to have solved all these difficulties we hope, in this pe ras 

compiled evidence which shows that the problem is capable of a simple solution eer iens 
rankings arise from the class of population models specified below. We proceed as fo 


o vith 
(A) We start with rankings generated by sampling from a bivariate normal par ieu 
correlation p. With the help of extensive sampling experiments backed by analyt 
approximation, we show that if n is not too large the z-transforms 


1 (9) 
"* Zs = tanh rg = Slog, 1 Ts zy = tanh rg 
m 


. et 
are approximately normally distributed with variances nearly independent of p. In fa 


1-060 0-437 (10) 
vangan e, mi. arce 
The expectation of Zg can be expressed approximately as a simple function of P» er 
use of the expressions for 6 (rx) and var (rg) given in (7) and (8). The approximation f the 
expectation of zg is less satisfactory in small samples owing to the inadequacy pies 
expression (6) for var (rs)*. It should be noted, however, that, just as in using the ge Sis i 
formation for Tz, & knowledge of the precise expectation of the transformed varia 
not necessary in a number of the test procedures that become available. stribu- 
ts in A can clearly be extended to a much wider class of parental distr tes 
rt from a bivariate normal distribution of x, y and introduce new vari 


; and Y 
X =f, = Wy), the rankings of X and Y will clearly be identical with those of « an 
provided the functions f and g are monotonic, 


apply to rankings generated by the wider class 
sely, starting from an 


ariates x and y. : 

ink i 
normal, but we think field in 
this form. This is a ^! 


rated 
; we may state that if the rankings are gene f the 


in equations (10). Further de á 
Sforms is an unbiased esti! 


b a 
sted 

€ dependence have been ener dere 

ution. Such models are not co 


" models of non-parametrj 


not generated by a parental bivariate distrib 


here. 
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significance may be applied to the z values to determine whether two or more samples are 
likely to have come from populations with a common p. 

(D) Within these conditions it is possible to make approximate comparisons of the 
relative merits of the rank coefficients rg and rx (and later we hope of rp). In particular, 
we may compare their power in detecting differences in population p values. 


2. THE EXPERIMENTAL DISTRIBUTIONS OF SPEARMAN'S AND KENDALL’S COEFFICIENTS 
(2-1) The distributions of rg and rz 

The experimental sampling made full use of the 25,000 sets of correlated normal deviates 
referred to in § 1-1. Thus, we had 2500 samples with n = 10, 833 with n = 30 and 500 with 
n = 50. For each value ofn we had samples from nine bivariate normal populations, namely 
those with p = 0:1(0-1) 0-9. The samples of 10, 30 and 50 were independent in the sense 
that the 25,000 cards containing the basic data were re-shuffled between each of the three 
experiments. The basic calculations for our study were all carried out in the Mathematics 
Division of the National Physical Laboratory. The samples were formed and ranked on the 
Division’s punched card installation under the supervision of Miss M. U. Thomas. She was 
responsible, also, for the calculation of all the values of Sg (of equation (1)) and Sp (the 
numerator on the right-hand side of equation (4)) and for that of Px (of equation (3)) for 
samples of size 10. The values of Pj for samples of sizes 30 and 50 were obtained on the Deuce 
digital computer by Mr T. Vickers and Mr B. W. Munday. An account of the methods used 
Will be given in a later paper; we plan also to print the observed frequency distributions 
Corresponding to the various coefficients. : 

Comparison of the observed mean values of rg with the theoretical values of equation (5) 
and of the means and variances of rg with equations (7) and (8) is only useful as a check on 
the representative character of the random samples. This check has been made and passed 


Satisfactorily; the observed values are not reproduced here. Examination of the variance 

Of rs is however necessary, equation (6) giving an approximation only to the true value. 
(2-2) The variance of rs 

The Kendall formula (6) does not give the correct values of 1/(n — 1) and 0 to var (rg) 

urely empirical adjustment is obtained by substituting 


When p = 0 1, respectively. A 
4 — MS ror 12 which reduces the variance to zero 


n~ 1 for n as divisor and adding a term +0-019785p 
When p = 1, so that we have 


1 502 4. 4 e d 
var (rg) = = —7 {1 — 1563465" 0-304743p* + 0-155286p 
4 0-061552p8 + 0-022099! + 0-019785p*]. (11) 


Table 1 contains for each of the three sample sizes, (4) the estimated variance from equation 
(11), (5) the observed variance from the sampling experiment, Id) smoothed valums-aF (2) 
tained by a rough graphical process. These last values are made use of in § 3-2 below. It 

ill be seen that for n = 10, the modified Kendall formula (11) gives values which for 


observed values. The theoretical approximation 


p> 
30. It seems clear that for small samples 


an 0:3 are consistently smaller than the 
^; also too small, but less noticeably so, when ” 
us (rg) cannot be accurately expressed as the pro 


“ow, when a i i the variance of zs 
à roximating to the v : 
“™oothed ies values of var (rg) taken from the third columns of Table 1. 
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Table 1. Variance of Spearman’s rg 
n=10 n= 30 n= 50 
| 
p | | d 
From b Smoothed| From | Obs, Smoothed) From Obs. e o 
(11) Opa obs. (11) è obs. a1) obs. 
L * 
0-1 | 0-1094 | 0-1061 | 0-1093 | 0-0339 | 0-0334 | 0-0342 | 0-0201 | 0-0192 Rees 
0-2 | 0-1042 | 0-1041 | 0-1055 | 0-0323 | 0-0338 | 0-0336 | 0-0191 | 0-0215 € 
0-3 | 00958 | 0-1002 | 0-1002 | 0-0297 | 0-0321 | 0-0317 | 0-0176 | 0-0192 gore 
0-4 | 0-0843 | 0-0916 | 0-0923 | 0-0261 | 0-0263 | 0-0273 | 0-0155 | 0-0160 pe 
05 | 0.0701 | 0-0801 | 0-0805 | 0-0218 | 0-0227 | 0-0225 | 0-0129 | 0-0133 | 0:013 
0-6 | 0.0539 | 0:0638 | 0-0644 | 0-0167 | 0-0181 | 0-0172 | 0-0099 | 0-0110 Me 
0-7 | 0-0366 | 0:0443 | 0-0470 | O-0114 | O-0117 | 0-0117 | 0-0067 | 0-00604 Gone 
0-8 | 0-0199 | 0:0322 | 0-0303 | 0-0062 | 0-00674| 0-0067 | 0-0037 | 0-00348| 0-0 a 
0-9 | 00062 | 0-0125 | 0-0130 | 0-0019 | 0-00241| 0-0024 | 0-0011 | 0-00111| 0-00 
| 


3. THE TRANSFORMATION OF THE RANK CORRELATION COEFFICIENTS 


(3:1) The transformation and its justification 


Our object is to find transformations of rg andr, which will give variances approximately 
independent of p and will at the sam 


e time make the distributions roughly normal. The 
basic distributions of rg and Tg become increasingly skew as | p | 7 1. It is natural that we 
should consider R. A. Fisher's z transform which proved so successful in the case of the 


product moment correlation coefficient 7,,, in normal samples. If we write in general 


l+r 


z = tanh r = zlog. — (12) 


then the cumulants of z may be expanded in series in terms of the cumulants of r- The 
leading terms of the expansions for the mean and variance of z are given in equations (13) 


and (14): 
E = Hog HO (13) 
var (2) = ee (14) 
where 7 = x,(r) = G(r). 


The distribution of r depends onl. 


: y on the single parameter p and it will be seen that 
a first approximation the z transfo 


rmation may be ex 


actly from equations (5), s 
oothed observed values fro ta 
particular, there is a defin! ) 
(r) and Kal” 
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Table 2. First approximations to the variance of 25 and zg 


var (rs)/(1 —75)®* var (rx)/1 —7z)*t 

p 

n —10 n = 30 n = 50 n=10 n= 30 n= 50 
| | 

0-1 0-111 0-035 0-021 0-0618 | 0-0166 0-00952 
0-2 0-112 0-036 0-021 0-0619 0-0166 0-00950 
0-3 0-115 0-037 0-022 0-0622 0-0166 0-00947 
0-4 0-120 0-037 0-022 0-0627 0-0165 0-00943 
0-5 0-124 0-037 0-022 0:0634 0-0165 0-00937 
0-6 0-126 0-037 0-023 0-0644 0-0164 0-00930 
0-7 0-130 0-038 0-022 0-0662 0-0164 0-00920 
0-8 0-141 0-039 0-022 0-0697 0-0164 0-00910 
0:9 0-155 0-043 0-022 0-0787 0-0168 0-00905 
p s " 0:1224 0-0370 0:0219 0-06404 0-01649 0-00936 


* var (rs) obtained from smoothing the experimental values. 
T var (7x) is the correct theoretical value. 


Frequency tables of the distributions of zy and zx have been obtained and the following 
Sections are concerned with comments on the mean values and variances obtained from 


these tables and with the normality of the distributions. 


(3-2) The mean values of zs = tanh rg and zg = tanh rz 


In Table 3 we compare : 
(a) the observed mean values of zg found from the experimental data; 


(b) the approximate theoretical value of &(25) given by the first two terms of (13), namely 
E(zg) = tanh- rg +7s var (rg)/(1 —78)*. (18) 


Where Fg is calculated from (5) and var (rg) is thesmoothed observed value already referred to; 
(c) the second or ‘corrective term’ from the right-hand side of (15). ; 
Owing to the fact that in a few samples of 10, the rankings of the two vaniates were in 

Perfect agreement, some values of rg (and rg) are unity and the corresponding zs (and zx) 

“come infinite.* The means and variances tabled omit these observations which in any 

Case form a very small part of a distribution of 2500 observationi. We first, however, made 

“stimates of the mean and variance of z, using the technique fora censored distribution, but 
? difference in results was not large enough to be of importance. Having regard to the 

Standard errors quoted below the tablet it will be seen that the differences between obser- 

Vation and approximate theory are not significant except perhaps in the case of p= 0-9. 
© corrective term is of some importance in small samples with large p, but is steadily 


* ases for p = 0-8 and once for p = 0-7. 


z i m " 
This happened in seven cases for p = 0-9, in three d values of g, given below Table 5 for zs and 


TIRE zi he average 
Table [» dew ee dede. eed 33 and 500, respectively. 


for zx have been used and N = 2500, 8 
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" nent 

reduced in importance as n increases. In the case of the transformed product mon ace 

i : TAM iv mila’ 

correlation coefficient a similar, if less important, effect is present. Table 4 gives si w 
results for the mean values of zg, except that in this case the true values of var (7 g) may 


Table 3. Mean values of zg 


n=10 n= 30 n= 50 
d Approx Corr. Approx Corr. Approx.| Corr. 
Sls theory term ps theory term One: theory term 


0-1 0-094 0-097 0-010 0-097 0-096 0-003 0-096 0-096 0-002 
0-2 0-195 0-195 0-019 0-195 | 0-194 0-007 0-191 0-194 0-004 
0-3 0-304 0-299 0-030 0-297 0-296 0-010 0-297 0-296 0-006 
0-4 | 0-416 0-409 0-042 0-410 0-405 0-014 0-406 0-405 0-008 
0-5 0-526 0-529 0-055 0-517 0-525 0-017 0-522 0-525 0-010 


06 | 0-671 | O-665 | 0-068 | 0-661 | 0-661 | 0-021 | oos 0-663 | 0-013 
07 | 0842* 0-825 | 0-082 | 0-833 | 0-826 | 0-095 0-838 | 0-828 | 0-014 
0-8 | L032*| 1-038 | 0-103 | 1-051 | 1-043 | 0.030 1:056 | 1-047 vibe 
09 | L374* | 1-361 | 0-131 | 1-406 | 1-389 | 0.038 1-417 | 1399 | 0-018 
| 


A d 
Approximate theory = tanh- Ts +7s var (rg)/(1 7), using 7s from equation (5) and the smoothe! 
observed var (rs). 


Corrective term 
Standard errors 


= second term in expression for approximate theory. 
of observed means: for n = 10 about 0-008; for n = 30, 50, about 0-007. 


Table 4. Mean values of zg 


n —10 n= 30 


Obs. Approx. Corr. Obs Approx.| Corr. 


Approx. | Corr. 
theory term 2 theory Obs 


term 5" theory | term 
| J 


91 | 0-065 | 0-068 | 0-004 | 0-066 | 0-065 0-001 | 0-065 | 0-064 | 0-001 

92 | 0135 | 0137 | oo08 | o131 | 0131 9-002 | 0-128 | 0-130 | 0-001 

93 | 0209 | 0-209 | 0.012 | 0-199 | 0-200 | 0-003 0-199 0-198 0-002 

pt | 0289 | 0-285 | oo16 | 0275 | 0.073 | 0004 | 0-271 | 0.271 | 0-002 

95 | 0-361 | 0-368 | oor: | 0346 | 0352 0-005 | 0-346 | 0.350 | 0-003 

96 | 0-465 | 0462 | 0-026 | 0.439 | o442 . " s 0-004 

97 | O582*| 0-574 | 0-033 | 0.551 oon | Cae | o4 
n 
| 


0549 | 0-008 | 0550 | o-545 | 0-005 
98 | o717*| ong | 0-041 | oses | ossa 0-010 


0-089 | 0-684 | 0-005 
0-905 | 0-012 


5 e H a a | el 
99 | 0-961* | 0-949 | 0.056 | 0.917 0-909 | 0-899 | 0:006 


Approximate theory 
equations (7) and (8). 

Corrective term — second term in ex 

Standard errors of observed means: 


= tanh” Ty zy var (rg)/(1 — 722. where 7g and var(7x) are derived from 


pression for approximate theory. 
for n = 10 about 0-0055; for n = 30, 50 about 0-0045. 


* Ignoring the few infinite values, 
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Table 5. Observed variance and standard deviation of z 
" s 
Variance SD. 
| 
P re I 
n —10 n= 30 n= 50 n=10 n= 30 n= 50 
| 
UE mM es —— | 
0-1 0-1380 0.0365  , 0-0204 0:371 0-191 0-143 
0-2 0:1407 0:0378 | 0-0239 0:375 | 0-194 0-155 
0-3 0:1473 | 0-0405 0-0238 0-384 0-201 0-154 
0-4 0-1528 | 0-0374 0:0226 0-391 0-193 0-150 
0-5 0.1537 00407 |  0-0230 0-392 0-202 0-152 
0-6 0-1507 0-0389 0-0246 0-388 0-197 0-157 
0-7 0-1423* 0.0380 | 0-0213 0-377 0-195 0:146 
0-8 0-1643* | — 0-0409 0-0227 0-405 0-202 0-151 
0-9 0-1700* 0-0465 0-0235 0-412 0-216 0153 | 
woan, 0:14872 0-03884 0-02279 0-385 | 0197 0-151 
p = 0:1-0:8 
= E. 
E086 0-389 0-198 0-150 
v(m — 3) | 
Standard errors of s.D.: n = 10, 0-0055; n = 30, 0:0049; n = 50, 0-0047. 
Table 6. Observed variance and standard deviation of zg 
Variance S.D. 
p - 
n=10 n= 30 n= 50 n= 10 n = 30 n = 50 
01 0-06830 001700 0-00933 0-2613 0-1304 0-0966 
0-2 0-06884 0-01758 0-01074 0-2624 0-1327 0-1036 
0:3 0-07290 0-01830 0-01049 0-2700 0:1353 0:1024 
0-4 0-07446 0-01639 0-00991 0-2729 0-1280 0-0996 
0-5 0-07443 0-01712 0-00966 0-2728 0-1308 0-0983 
0-6 -07384 0-01628 0-00985 0:2717 0-1276 0-0992 
0-7 ain 0-01514 0-00822 0-2636 0:1230 0-0907 
08 0-08126* 0-01551 0-00824 0-2851 0-1245 0-0908 
0-9 0-08910* 0-01712 0-00780 0-2985 0-1308 0-0883 
E m -1290 . 
Mean, m 0-016007 0-00956 0-2700 0 0-0977 
= 0-1-0.8 
0-6611 0-2699 0:1297 0:0975 
(n — 4) 


Standard errors of S.D.: n = 


* Tgnoring the few infinite values. 


10, 0-0038; 2 = 30, 0-0032; n = 50, 0-0031. 
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derived from equation (8). Again, the differences between observation and the rs cence 
tion appear only to be significant when p = 0-9. The corrective term is important for oa 
and p large; it is smaller in proportion than the corresponding term in the approximation 
rom (3:3) The variances and standard deviations of zg and zg 
Tables 5 and 6 contain the observed variances and standard deviations for the trans- 
formed variables*. Comparison with Table 2 shows that the first term in the expansions for 
var (zg) and var (zg) is definitely not adequate when n = 10 and still somewhat in defect 
for the larger samples. These points are brought out by a comparison of the mean values 
given at the bottom of the tables for the eight cases p = 0-1 to 0-8. Apart from the extreme 
case with p = 0-9, the change in the variance of z with p is not very great. We shall not 
attempt now to discuss the changes further. The figures, however, suggest that for most 
practical purposes if p < 0-8 it will be justifiable to assume a constant variance for z, for any 
given sample size not greatly exceeding 50. The expressions given below are not, however, 
to be regarded as asymptotic results. 
Assuming that we may use the observed mean values given at the bottom of Tables 5 
and 6 we may then look for a general empirical expression for the variance of the form 


var (z) = a/(n—b), 


where b is an integer. We suggest the use of the following: 
For Spearman’s coefficient 


1-060 1-03 
var (zs) = wag C= 4-3) 
For Kendall’s coefficient 
0:437 0-66 
var (zy) = 


The resulting approximations for the standard deviations of zg and zę when n = 10, 30 
and 50 are given at the bottom of the right-hand side of Tables 5 and 6 where they may be 
compared with the individual sampled values and the means of the latter for p = 0-1 to 98: 
We think that except for p » 0:8, the approximation can be safely used in tests of signi 
ficance for 10 « n < 50, provided, of course, that the underlying conditions discussed in $ 13 
are applicable to the data. 


4. NORMALITY OF THE z DISTRIBUTIONS; PRELIMINARY COMMENTS 
In the case of n = 10 three difficulties arise in exa; 
experimental distributions. In the first 


the moments of z; alternatively, moments could be estimated using the technique 2 
dealing with truncated observations. Secondly, while possible values for rg and rg ^"? 
equally spaced, the possible values of 2s and zx occur at intervals which Fecal with the 
z value. For » = 10, where the number of permissi 

difficult to know what criterion of normality 
exhibit, particularly for low values of p; 


: h Kenda 
(1948, p. 47) in the case of independence. TI ue E ned 


hese factors all make it difficult to know how t° 
* For n=30, 50 the variances tabled are ms, 


but for n=10 they are kts =m,N/(N —1)- 


E —À— Jer EO P 


—— 
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assess the importance of excessive values of x?, when comparing the observed distributions 
with fitted normal curves. Although we have not yet available all the values of Z,(z), it 
appears, as in the case of the z transform of the product moment correlation coefficient, 
that the z distributions are somewhat leptokurtic (£, > 3). 

Forn = 30 and 50 we have fitted a certain number of normal curves to the z distributions. 
The result of applying the x? test for goodness-of-fit is summarized in Table 7. Apart from 
three values of y? which are over 30, the fits appear very reasonable. It is clear that the 
matter needs further investigation, but we doubt whether even in samples as small as 10, 
the assumption of a normal z distribution will lead to any serious misinterpretation of a 
significance test. 


Table 7. Normal curve fits to observed distributions of zg and zg 


zs; n = 30 zs; n = DO zg; n = 30 zg; n = 50 
p | T À : 
x D.F. x D.F. | x D.F. x? D.F. 
= | 
0:1 21-1 21 16:3 19 — — = — 
02 | 329 | 22 20-6 20 = E = E 
03 | 378 22 20-1 20 = Exo ma M 
04 23-7 21 23-1 20 — = = E. 
05 28-7 22 13-1 20 22.1 21 12-9 21 
96 | 11 21 171 20 = = 2 E 
0-7 16-6 21 29-7 19 = = EN Te 
0-8 21:9 22 15:6 19 26-0 20 14-7 21 
0-9 23-0 23 159 | 20 34-8 21 22:3 19 
Total 216-8 195 171-5 177 82-9 62 49-9 61 
l 


5. THE SENSITIVITY OF THE CORRELATION MEASURES TO CHANGES IN p 


In broad terms the power of discrimination of any one of the possible correlation measures 
depends upon the rapidity with which its sampling distributions draw clear of one another 
as the population p changes. If, for example, for a given value of n, the distribution of rs 
Or p = 0-2 does not sensibly overlap the distribution for p = 0-8, then if a single sample of 
” is drawn from each population a test of significance will always establish a difference in 
Population p values. The amount of overlap can of course be seen most directly in the 
Stributions of rg and rg (or Ss and Pg) which we hope to publish later. — i 
f the distributions of the z's were normal with a standard deviation c; which is fixed for 
à given sample size, the efficiency of discrimination would depend on the way in ear 
Scale of mean z expressed in standard measure (i.e. &(2)/0,) opened out as im d iom 
to 1. Without assuming a constant o,, we can obtain a rough measure of local sensitivity 
y calculating the ratios (Z.—%)/V/(s2, + 82,) of í 
(2) the differences between pairs of consecutive 0 
(5) the Square roots of the sum of the correspon 
&ble 5 (or 6). 
3 


bserved means given in Table3 (or 4), to 
ding pair of observed variances from 


Biom. 44 
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These ratios are given in Table 8 for both Spearman’s and Kendall's z.W a y n = 
corresponding ratios for the product moment correlation sapin. taking ‘ pene s 
from the full Fisher expansions as corrected by Gayen (1981, p. 236). Having eh 
sampling fluctuations, it is clear that we cannot establish: any difference e uin 
between the two rank coefficients for n = 10. At n = 30 and 50 and for p> 0-5 Y : es 
zy is consistently larger than for zs, which suggests a possible advantage for : e wet) 
coefficient. More detailed examination of this point is however needed. It will = wate 
as expected, that the product moment coefficient is throughout more sensitive to chang 


P r : rer 
in p than either of the rank coefficients. In all cases for a given difference in p, the pow 
of discrimination increases with p. 


Table 8. Sensitivity ratios (Z, — z,)].] (si, + S2.) for different coefficients 


ee 
n=10 n=30 eS 
—— 
Pu Pa | | | i Product | dall 
Product ,, | Product |. EN u roduct |. earman| Ken 
cid Spearman! Kendall SERIE. Spearman) Kendal I 


moment 
| 
| 


01, 02 | 0205 | 0-191 0-188 | 0-383 
0-2, 03 | 0214 | 0-202 | 0198 | 0-399 
03, 0-4 | 0228 | 03206 | 0-208 | 0-427 
0-4, 0-5 0-250 0-198 0-188 0-469 | 


0359 | 0-352 | osor | o452 | 0447 
0362 | 0359 | 0523 | 0-489 | O4 
0-405 | 0406 | 0-558 | 0-503 | OD 
0.384 | 0387 | 0-615 | 0542 | € 
05, 06 | 0286 | 0363 | 0-270 | 0537 | osos | osos | 0-704 | 0-655 e 
06, 07 | 0345 | 0315 | 0-308 | 0-649 | 0-622 | 0-636 | 0851 | 0.809 le 
07, O8 | 0456 | 0345 | 0348 | 0-861 | 0-776 | 0-805 | 1113 1-04 ai 
08, 09 | 0732 | 0-592 | 0-592 | 139 | 1.20 1-24 1.82 1-68 


| 


Concluding remarks 


- 
Besides putting on record the basic sampling distributions we hope in a further pape 


: S a ; : e 
to carry our investigations further in a number of directions, in particular to give parall 
results for the coefficient rp of equation (4). 


We should like to express our great indebtedness to Miss M. U. Thomas and Mr T. Vicker? 
whose work has already been menti 


oned, to Mrs Esmé Hill, formerly of the Statistic 
Advisory Unit, Ministry of Supply, 


and to Mrs Maxine Merrington and Miss Janet Hall pi 
University College London. Finally, we should like to say how much we owe to the © 
operation of the Mathematics Divisio 


ies 
n of the National Physica] Laboratory for the facilite 
which made this investigation possible. 
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THE TWO-SAMPLE t-TEST BASED ON RANGE 


Bv P. G. MOORE 
University College London 


1. INTRODUCTION 


A test for the significance of a difference in means, using an estimate of standard deviation agate 
sample range, was suggested by Lord (1947) as a quick substitute for Student's t-test in the bs at e 
the two samples are of equal size. Thus, in Table 10 of his paper Lord gives six significance levels wu 
ratio | z, — z; |/3(w, +w), where Z,, Z, are the means and W, W the ranges in samples of size n Ber oe 
normal populations having a common standard deviation, c. The null hypothesis is that the popula 
means, //,, JL are equal. . nami 
This test has been in considerable use as a quick method of detecting differences in means, particu ‘al 
in the industrial field. In a discussion which took place at a weekend conference of the Industr it 
Applications Section of the Royal Statistical Society, held in July 1956, speakers asked whether 


r: aces A al. 
would be possible to provide a similar quick test for the case where the sample sizes n,, n4 were not equ 
The present paper gives an answer to this question. 


When n +n, a number of points need special consideration. two 

(a) The most efficient range estimate of o is no longer based on the unweighted mean of the i ie 
sample ranges. If for practical convenience we decide to use 0 = w, +w, as our estimating pto 
rather than à = w, -- f(n;, ng) w; (where f is an appropriate weighting function), it is desirable to Ka 
what loss of power is involved. This will be considered in § 4, where it is shown that the loss is small. 


(b) In Table 2 we give, however, two functions which enable the more accurate estimate of v base 
on ¢ to be obtained if desired. 


(c) With 7, +n, there is no longer any point in using Lord's mean range and we shall take as 0U" 
test ratio E. 
iss icta] () 
Ww, bw, 


Table 1 printed at the end of this paper gives values of u corresponding to four different significance? 
levels, a, used in a two-tailed test. 


7 in the range 2 to 20. Computation P? 
quadrature as used by Lord in the case 7, = n4 would now be 


pted the type of approxi i 
to be distributed as a multiple of 
freedom v depending on n, and nj. Following this approac 
of the ratio w of equation (1)in terms of Student's t with mo 


74. The accuracy of this approximation is examined in $5. We shall first give two examples D 
strating the use of Table 1. 


and have taken w, Ts, 


2. ILLUSTRATIONS 
i t 
Example (i). Two operators, A and B, make determinations of the percentage of ammonia in pla? 
gas, with the following results: 


Operator A 39 35 43 32 36 48 33 33 
Operator B 43 44 56 63 46 
From these figures can it be said that the two operators are consistently measuring the same thing? 
The data give 


T, = 97-375, w = 16, È (zu —7,)? = 221-875; 
t 


= 50-4,  w,—20 E (TaT)? = 3059, 
t 


50-4— 37:375 
Hence = —— ~ . 
"7706x309 = 0362, 
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and from Table 1 with n, = 5 and n, = 8, u is just beyond the 1% level of significance, remembering 
that we are using a two-tailed test here. The usual form of t-test gives 


50-4 — 37-375 
Fe eS 8 = BT, 


Ge 


The degrees of freedom, v, are 11 and t is again just beyond the 1 % level of significance. Hence, with 
both tests we would have detected a real difference between the two operators. 

Example (ii). Twelve children aged 12 years to 12 years 3 months were selected at random from a 
School and fed on a special diet of pasteurized milk for 4 months. The gains in weight, in ounces, of the 
twelve children over the 4-month period were 


7, 17, 653, —2, 27, 41, 37, 35, 10, 12, 9, 38. 
Eight children of similar age were selected as ‘controls’ and not given the diet. The gains in weight were 
10, 0, 29, 11, —21, 26, 19, —19. 


F Tom these data it is desired to investigate whether the pasteurized milk diet results in a greater increase 
In weight over the 4-month period. The figures give 


T, = 23-667, w, = 50, X(x,—3m)*— 3202-7; 
v 

a= 6-750, w= 50, X(x—5,)?— 2485:5. 
i 


23-667 — 6-750 
He IE SOs 
206 50 4-55 
Using a one-tailed test, and entering Table 1 with n, = 8, ng = 12, it is seen that u falls just beyond the 
24% significance point. = . ] 
1 If the t-test is used, we have t — 2:085; with v — 18thisisa value not quite reaching the 23 % level of 
Significance. Both tests therefore show that there is an increase in mean weight associated with the 


Pasteurized milk diet and that approximately this is significant at the 23 % level. 


3. THE RANGE ESTIMATOR OF STANDARD DEVIATION 
When the samples are of unequal size, the unweighted sum 
0 =w ws (2) 


does not provide the best estimate of the assumed common population standard deviation, c. It may 


e sho : 
wn (see David, 1951) that d fna). (3) 
val Provide the range estimate with minimum coefficient of variation, where 


LS 
fn) = LAN x FE (4) 


g ariance of range in a random sample of size n from a normal population 


n and V. $ 
a Delis tha mean gao Y Values of d,, V, and their ratio are given by Pearson & Hartley (1954, 


ME unit standard deviation. 
le 20). The unbiased estimate of o is then 


= Wy +f (n, Na) Wa " (5) 
dy, +f (Mi a) ds, 
En We Shall proceed to show, the gain from using ¢ rather than @ does not appear worthwhile in a test of 
E cance. As there ma; "however, be situations where the more accurate range estimate of c is 
Meferred, ess z duis 2 at the end of this paper values of f(n, ne) from equation (4) and of the 
a Ominator T fn n,) dp, of estimator g of equation (5). The values of n4, n, go from 2 to 20, 
e n 1: *^2/ ^n, 
: "responding to the larger of the two samples. 
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4, JUSTIFICATION FOR USING THE UNWEIGHTED SUM OF RANGES 


We start by comparing in Table 3 the coefficients of variation of 0 and ¢ for certain combinations of 
n, and n,. It will be seen that only when the samples are of very unequal size is there likely to be any 
marked loss of efficiency. A more direct method of studying the loss of efficiency resulting from the us 

of 0 instead of ¢ in a test of significance is to compare the power functions of the corresponding tests. 


Table 3. Ratio of coefficients of variation of 0 and à 


n, 
3 5 7 9 
Na 
3 1-000 | — — = 
6 1:030 1-002 — = 
9 1-066 1-018 1-003 1:000 
12 1-097 1-036 1:013 1:003 
16 1-131 1-060 1-029 1-013 
20 1-158 1-080 1-043 1:024 E 
Table 4. Power functions of t and modified t-tests 
Values of (ju, — u,)/o 
Sample Type of 
sizes test 
1 9 3 4 5 
p m_i pe 
m= 6 R.M.S. 0-244 0-602 0-886 0-984 0-999 
n, = 10 wj 4- fw, 0-242 0-596 0-882 0-983 0-999 
Wy + Wy 0-241 0-593 0-879 0-982 0-999 
n 23 RMS. 0-249 0-614 0-895 0-987 0-999 
n, = 20 Wy 4. fw, 0-245 0-605 0-889 0-985 0-999 
V, tw, 0-241 0-593 0-879 0-982 0:999 
= RMS. 0-210 0-504 0-791 0-944 0-991 
Ng = 4 WU fw, 0-207 0-492 0:775 0-935 0-989 
y +W 0-204 | 0-482 0-767 0-929 0-986 


. The ‘equivalent’ degrees of free" ^ 
s test, and the degrees of fre? in 
terms of power may be found Ts of 
eyman (1935) for the one-tailed test. Result“ te 


using the root-mean-square estimate of c, the differe: 
We conclude that if a ran 


0 = w, +w, may be used sine 
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5. CALCULATION OF SIGNIFICANT POINTS 
Before making extensive useof Patnaik's X-approximation it was decided to test its accuracy in one case, 
with 1, = 4, na = 16, by also making a full evaluation by quadrature. 
(i) Quadrature 


It is not necessary to describe the method used in detail. Use was made of the values of the ordinates 
of the distributions of range required, kindly supplied by Mr Lord, who had previously derived them 
from the basie 5-decimal-place tables of the probability integral of range used by Pearson & Hartley 
(1942). Asa result, we could find the exact significance points of a normal variable divided by an estimate 
of its standard error based on the sum of two independent ranges. 


(ii) The y-approximation 


Patnaik’s method leads us to identify the first two moments of (w, + w,)/o7, where c is the population 
Standard deviation, with the first two moments of cy/ v where X has v degrees of freedom. The first two 


moments of (w,+w)/7 are 
M=d,,+dn, VmV,V, (6) 


and these are equated to those of cy/Av to give 


1 
M= $^ (F) e) (7) 


SL) ° 


Expanding the gamma functions by Stirling's formula we obtain 


fo j t i 
TEM AN SE MEUS 9 
k= an at pa” 1007 (9) 


AS a first approximation pol = —242/(1+2k) (10) 


and a better approximation leads to 
ER v = —242,/(14+2)), (11) 


Where j = k+ 1/(16v) the value of v from (10) being used for j. The expression for c to be the same order 


i ^! " 
accuracy is 1,1 5 m 
c= M\1+ p + 308 i285] 


Hence, if æ is a unit normal variable the distribution of x(d,, +d,,)/(w; + ws) can be approximated to 
by Mtje, whore the t-distribution has the ‘equivalent’ degrees of freedom from (11) above. The per- 
Centago points can be found by interpolating for fractional v in a table of the percentage points of the 


ppropriate factor. 


6 the percentage points calculated by the two methods are: 


leding 
distribution and multiplying by the a 
9r the case where n =4andn = | 


| /-approximation | 

a (two-tailed) Quadrature | X-app! i | 
0-10 | 1-744 | or | 
0-05 | 2-130 | ane | 
0:02 I 2-615 | 2.905 | 
0-01 | 2-971 | 


; 50, p. 81) for equal sample sizes shows that 
saca ne si od here as it was for the equal sample 


A Com, ; i b 
the, „Parison of these results with those given by s 
ation of the percentage points using the 


Size; V PProximation is as good for the unequal sample e d 
Yap ith this check it was felt safe to proceed with 
Proximation, 


486 The two-sample t-test based on range 


6. CALCULATION OF TABLE 


i ities were 
h of the 190 pairs of sample sizes, where n, and n, go from 2 to 20, the following quantities w 
For each o 
calculated: 
(i) M and V from d, and V, as in (6), 
(ii) 1/v from (10) and (11), 
(iii) c' = c/M from (12). 


; 5. 0-05 -01 were 
For each value of v found from (ii) the percentage points of ¢ for a = 0-10, 0-05, 0-02 eet chant 
obtained by interpolation in the tables. Call these valuest, „. The required percentage poir 


1 1 
DE aa 13 
ra J E Ši zj (18) 
pees M 
» c'(d,, +4) 


won 
1) was computed and u, finally obtained from (13). The Ware, OF 
in Table 1 to three places of decimals and it is thought that even in the worst cases, BT b the third. 
samples are small, or the samples are very unequal in size, the error should not exceed in edid n 
i were differenced for n, given n, and for n, given na, and in the T ple 10, 
alues can be checked against the values given by Lord (1947, Ta 
P- 66). They should be just half Lord's values since he used Hw, +w,) for his denominator in u. ample, 
etail modifications of this test which could be obtained, for n might 
ges of smaller subsamples. If n, = 8 and n; wie maller 
ability by splitting the second sample up randomly into res a nding 
subsamples, each of. eight, givingusthree ranges w,, w, and w%.'The expected gain can be gauge T "nmi 
the power function for a 5% significance level test as in Table 3. For the three ranges w+ W, 
Peters M = 3d, = 854160, V = 3V, = 2.01639, 
3 Ws 
and using equations ( 10) and (11) v is found to be 18-33. If the two ranges are used in tho SA a r 
as in this paper y = 16-72, whilst if the root-mean-square estimate of standard deviation is use 
The powers from Neyman’s tables are 


le 1 


2 
R.M.S. 0-250 0-615 
wtw, Tw 0-248 0-611 
10, tW; 0-247 0:607 
LL 
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Table 1. Values of u = |3, —z, |/(w, + we) exceeded with probability a 
(for a single-tailed test a must be halved) 


I 
Probability (a) Probability («) 
n, ng — n Ng 
0-10 0-05 002 | 0-01 010 | 0-05 0-02 0-01 
| 
| 
2 2 | 1161 | 1-714 3-958 4 | 16 | 0175 | 0-213 0.299 
3 | 0-693 | 0-915 | 12 1-557 17 | 0172 | 0-210 0-293 
4 | 0356 | 0732 | L 1-242 18 | 0-169 | 0-206 0-288 
5 | 0-478 | 0-619 | © 1-008 19 | 0-166 | 0-203 0-283 
| 30 | 0-164 | 0-200 0-279 
6 0.549 | 0-721 | 0-865 
1 0-502 | 0-052 | 0-776 5 5 | 0-247 | 0-307 0-450 
8 0-469 | 0-603 | 0-713 6 | 0-224 | 0.277 0-402 
9 0-443 | 0-507 | 0-666 7 | 0-208 | 0-256 0-368 
10 | 0.493 | 0-538 | 0-630 8 0-195 0-240 0-343 
186 | 0-22 0-323 
} 1 0407 | 0-515 | 0-601 10 | 0178 | 0-218 0-309 
12 0.393 | 0496 | 0557 s 
13 0.382 | 0-480 | 0-557 11 | 0172 | 0-210 0-296 
14 0.372 | 0-467 | 0-541 12 | 0-167 | 0-204 0-286 
15 0.303 | 0-455 | 0-526 13 | 0-162 0-198 0-277 
14 | 0-15 o 0-270 
16 | 0-287 | 0-356 | 0-445 | 0-513 15 | 0-155 | 0-189 0.263 
17 | 0282 | 0-349 | 0-436 | 0-502 
18 | 0.278 | 0-343 | 0-428 | 0-492 16 | 0-152 | 0-185 | 0-227 | 0.257 
| 19 | 0274 | 0-338 | 0-420 | 0-483 17 | 0149 | 0-182 | 0:222 | 0.252 
| 20 | 0-270 | 0-333 | 0-414 | 0-475 18 | 0-147 | 0-179 | 0218 | 0.248 
19 | 0-144 | 0-176 | 0-215 | 0.244 
3 3 | 0487 | 0-635 | 0-800 | 1-050 20 | 0142 | 0-173 | 0-212 | 0.240 
4 | 0-398 | 0-511 | 0-663 | 0814 ad aa 
5 | 0-339 | 0-429 | 0-556 | 0-660 6 6 | o 0-25 0-312 | 0-359 
7 | 0-188 | 0-240 | 0-287 | 0-329 
i 3 0-501 | 0-590 8 | 0177 | 0-217 | 0-268 | 0-307 
6 | 0-311 | 0-391 6 
1E JEFE IE INE 1E32E E 
8 | O3271 | O338 | 0427 | 0-498 1 . 0-276 
T OS oM US 0.448 11 | 0-155 | 0-189 | 0-233 | 0-265 
a 12 | 0-150 | 0-183 | 0-295 | 0.955 
tp | ase | ois | odes | odii ta | oie | care | care | COE 
13 | 0220 | 0279 | 0:347 0-899 15 | 0-139 | 0169 | 0-207 | 0.235 
15 9218 o Hr 0 330 0-378 16 | 0336 | 0-166 | 0-203 | 0.229 
15 | 0-21 17 | 0134 | 0-163 | 0-199 | 0.295 
7 31 | 0160 | 0-195 
mms mA HEU 19 | 0120 | 0157 | 010 0317 
re | ene ozsa | ou 0.856 20 | 0128 | 0-155 | 0-189 | 0-214 
19 | 0-202 . i 
: 344 7 7 | 0174 | 0-213 | 0-263 | 0-301 
20 | 0-200 | 0-245 |. 0-302 | 0 8 | 0163 | 0-200 | 0246 | 0.281 
9 | 0-155 | 0189 | 0-233 | 0-265 
4 4 0-322 0-407 0-526 0-620 0-181 0-222 
SO Sees | oper RED. | 0028 10 | 0-148 0-252 
i 11 | 0-143 | 0-174 | 0213 | 0242 
ME c rM e i ease 12 | 0338 | 0-168 | 0200 | 0.233 
7 | 0-237 | 0294 | 0-370 - 13 | 0134 | 0-163 | 0-199 | 0.296 
2| | oe | 15 | DIM 14 | 0-131 | 0-159 | 0-194 | 0.290 
9 | ozi3 | 0208 | 0327 | 0:377 15 | 0128 | 0-155 | 0-189 | 0.214 
10 0-204 0-252 0-313 
Ab 16 | 0125 | 0-152 | 0-185 | 0-209 
B | onon | Gee | 030 | MS 17 | 0123 | 0-149 | 0-181 | 0-205 
12 | o1 | 0235 | 0-291 eH 18 | 0121 | 046 | 0-178 | 0-201 
13 | 0-186 | 0-228 | 0-282 | O- T 19 | O119 | 0144 | 0-175 | 0-198 
14 | 0182 | 0:223 | 0275 | 0314 30 | 0117 | 0-142 | 0172 | 0-195 
15 0-178 0-218 0-268 0:30! 
EN 
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Table 1 (continued) 


Probability («) Probability (a) 


| 
| 
m No nm | n | 
| 


0-10 0-05 0-02 0-01 0-10 0-05 0-02 | 0-01 
| 
j 8 | 9145 262 | 12 | 12 | 0-107 | 0-130 | 0-158 | 0-178 
io | 039 "d B 0-104 | 0-126 0-153 0-172 
235 | 0-101 0-122 0-148 0-167 
1 | 0133 kis | 15 | 0098 | 0-119 | 0-144 | 0-162 
"BE 317 16 | 0096 | 0-116 | 0140 — 0-158 
14 | 0-122 EM 17 0-004 | OIS: | 0-137 | 0-154 
5 : : 0-092 | O11 | 0-134 | 0-15 
io p Gug 0-199 19 | 0090 | 0-109 | 0-132 | 0-149 
16 | 0-116 -— 20 | 0.089 | 0-107 | 0-130 | 0-146 
17 | 0114 i 
is | oats 0190 | 13 | 13 | 0-100 | o-121 | 0147 | 0-166 
19 | 0-110 p 186 n 0-097 | 0118 | 0-143 0101 
20 | 0-109 ERE 5 | 0095 | 0-115 | 0-139 0-15 
0-180 
9 9 0-137 16 0-092 0-112 0-135 0:152 
10 | 0-131 P 17 | 0090 | 0-109 | 0-132 | 0-149 
2 B 0-089 | 0-107 | 0-130 | 0-146 
ll | 0-126 j 0.087 | 0-105 | 0-127 | 0-143 
12 | 0122 HY 20 | 0-086 | 0103 | 0-125 | 0-140 
it Pr 9197 | 14 | 14 | 0094 | Otia | ogg | 0-156 
18 | 0112 HE 15 | 0092 | 0-111 | 0-135 | 0:151 
16 | 0-110 Fi 16 | 0090 | 0-108 | 0-131 0-147 
182 
17 | 0-107 17 | 0-088 | 0-106 | 91298 | 0-144 
18 | 0106 hice 18 | 0-086 | 0104 | 0-125 | 0141 
19 | 0-104 0-172 9 | 0-084 | 0-102 | 0-123 | 0-138 
20 0-102 0-169 20 0-083 0-101 0-121 0:135 
10 | 10 | 0-125 15 | 15 | 0-089 | 0-108 i 0-147 
0:210 16 | 0087 | o105 | o197 | 0143 
11 | 9120 0-201 17 | 0-085 | 0-103 | o-124 | 0-140 
12 | 0116 0-194 18 | 0083 | 0-101 | 9-122 | 0-137 
13 0112 O87 19 | 0082 | 0-099 | o.119 | 0-134 
5 2 
0-17 
7 | 16 | 16 | 0-085 | o103 | o-194 | 0-139 
16 | 0-104 0-173 x 0:083 | 0-100 | 0121 | 0136 
1 | o102 | 0-169 i9 | COSL | 0-098 | ols | 0-133 
8 | 0100 0-165 0-080 0-096 0-116 0-130 
2 0-098 0-162 20 | 0-078 | 0.094 | o-114 | 0-128 
2 097 -160 
u 9169 | 17 | 17 | 0081 | oogg | otis | 0192 
H | ons 0-193 18 | 0-079 | 0-096 | 0-115 | 0-130 
A 0-111 0-185 B 0-078 0-094 0-113 0-127 
3 6-108 ers 9076 | 0.092 | o-111 | 0-124 
| 15 0-102 oie 18 18 0-077 0-093 0-113 0-126 
$i 39 | 9076 | 0-092 | o-110 | 0124 
i> | one 0-165 0074 | 0-090 | 0-108 | 0-121 
18 | 6. 0-161 19 19 " 
| i nine 20 | 0074 | 0-090 | o-108 | 0-121 
a | ae 0-155 '073 | 0-088 | 0-106 | 011 


0152 | 20 | 99 
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Table 2. To assist in estimating a standard deviation from the range estimator 
g — (wy + f(3, Na) We)/(An, Hf Ne) ds) 


n Ng f dj +fdny | m Ns f dy fd, | m Na $ day tfang 
2 3 3-461 5 13 1-804 8-344 10 | ar 1-057 6-431 
4 4-655 14 1-878 8724 | 12 | 1-110 6-694 
5 5-801 15 1-949 9-093 13 | 1-160 6-947 
14 | 1-208 7-192 
6 6-873 16 | 2-016 9-146 15 | 1253 7:427 
7 7-920 17 | 2080 9-789 | 
8 8-903 18 | 2142 | 10-123 16 | 1-297 7-658 
| 9 9-833 19 | 2-201 10-445 17 | 1-338 7878 
10 10-721 20 | 2258 10-760 18 1:377 8-089 
19 | 1-416 8-301 
ll 11-592 6 7 1:105 5-522 20 1-452 8-500 
12 a 8 1-202 5-956 
13 z 9 1-291 6368 | n 12 | 1-050 6-594 
14 15905 10 1:375 6-766 13 1:098 6-836 
15 1 14 | 1143 7-067 
[od ll 1:453 7144 15 1-186 7-291 
16 Toe 12 1-526 7-506 
17 16-1 13 1:595 7:855 16 1:227 7-507 
18 16-766 14 1-661 8:193 17 1-266 7:715 
19 s 15 1-723 8:516 18 1-303 7-916 
20 ^ 19 | 1-340 . 
16 1:782 8-828 20 | 1-374 $306 
3 4 4:246 17 1-839 9-132 
5 5-077 18 | 1-893 9425 | 19 | 13 | 1-045 6-744 
ý 19 | 1-946 9-713 14 | 1-088 6-965 
6 pos 20 | 1-996 9-989 15 | 1-129 7-178 
7 à 
8 7-322 " 8 | 1-087 5-199 16 | 1-168 7-388 
9 7-995 9 | 1-168 6:173 17 | 1-205 7-581 
10 8-639 10 | 1-244 6:532 1a Laa TUS 
1 1:275 7-961 
11 9-207 1 1:314 0-873 20 1:308 8:143 
19 9-868 12 1-380 7-201 
16 Ioe 13 | 1443 7518 | 15 | 14 | 1-041 6-882 
-987 14 | 1-502 7.82 15 1-080 7-086 
15 11-518 18 | L558 | $113 p ‘ 
. 16 | 1-118 7-285 
16 12-035 16 | 1-612 8-398 17 | 1-153 7-473 
5 
7 12-532 17 1-664 8-674 18 | 1187 7-657 
18 13-014 18 | 1-713 8-939 19 | 1-220 7-837 
19 13:494 19 1-760 9-197 20 | 1251 , 8-008 
20 13-936 30 | 1-806 9-449 
| 
5 5 1:037 7-007 
ti Bj om] 4M] a | ul som | ew | 511 [| A | ee 
| " sn 10 | 1144 6-368 | 17 | 1108 | 7-382 
6-018 18 1-140 7:557 
7 11 1-208 6-680 19 1-172 7-730 
8 6-595 .969 6-982 | 7 
s 7-141 12 | 12 asa 20 | 1-202 7-896 
10 7-663 t 1381 1-652 | 
14 . -552 He 
n 8-077 15 | 1433 ESSE 7-800 
-64 | -086 18 1-099 7-472 
12 a is ges 8:336 19 1-129 7.637 
l4 9-554 18 1:575 8-580 20 | 1-159 7-801 
15 9-985 19 1-619 8-819 | 
16 | 2362 | 10402 20 | er y ee | ee IB. 
. le: a | g T 
17 2-437 10-803 | 63244 19 | 1-092 7-560 
18 | 2-509 | 11-192 9 | 10 | oe 6-539 20 | 1420 7-715 
5 11-569 1 1125 
19 | 2578 [s 12 | 1182 6-821 
20 2-645 1r 13 1.235 7-090 17 18 1-030 7.337 
" " pa 5158 14 | 1-286 AM 19 | 1-058 7-491 
b rii pR 15 1-334 7 20 | 1-085 7-640 
8 1:360 6-198 7-844 "i 
9 | 1461 | 6-005 Hd | som | 18 | 20 | Dx DES 
10 1:568 7182 d 1-466 8-306 e ERU 
19 | 1:507 8-529 
ll 1:643 7-539 20 | 1546 $744 | 19 | 20 | 1-096 7-521 
12 1-726 7-950 
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A BIBLIOGRAPHY ON THE THEORY OF QUEUES 


By ALISON DOIG 
Research Techniques Unit, London School of Economics 


In this bibliography an attempt has been made to assemble all papers on those aspects of 
the theory of probability which may be grouped together under the heading of the study 
of queues. In addition to purely theoretical work, the practical applications of the subject 
have received much attention, the most important and fruitful of these being in the field 
of telephone traffic. The other main applications have been to the study of road traffic, 
the allocation of operatives to the servicing of machines, the mathematical aspects of 
inventory control and production scheduling, storage problems such as the optimal size 
of dams and miscellaneous topics which include the scheduling of air traffic and the design 
of appointment systems in hospital outpatient departments. The study of point processes 
and, in particular, the problems arising from counts obtained from Geiger-Müller counters; 
resemble closely problems arising in telephone systems where no waiting is possible (loss- 
Systems) and a selection of papers on such processes has been included. Similarly; 
problems arising in the theory of dams and provisioning are closely related to the problem 
of collective risk in actuarial Studies. In the view of the large number of papers on the 
latter topic, some of which are specifically actuarial in character, the choice in this case has 
been limited to a few papers only, these being chosen both for their own intrinsic importance 
and for the fact that they themselves provide references to other work. A similar policy 
has been followed for papers concerned strictly with the economies of inventory control 
and also for papers on renewal theory. 
Classification. 'The papers are listed in alphabetical order by authors and have been 


classified as belonging to one or more of ten main groups. These are: 


Problems dealing with storage (content). 
Problems relating to flow through a network. 
Applications not covered by the other categories. 
Inventory problems. 

Problems arising in servicing automatic machines. 
Point processes and counter problems, 

The general theory of queues. 
Road traffic and related topies. 
Stochastic processes directly relat 
Problems in telephone traffic. 
Within these groups, a further subdivision has been made i 


results (including tables and graphs) and practical application: 
descriptive articles (d). Where it is rel 


ed to the study of queues. 
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such a description is lacking. For simplicity, variants of the basic queue discipline 
(service in order of arrival) are grouped together under the letter v; such variants include 
queueing with priorities, random choice of the next customer to be served from amongst 
those awaiting service and queues in series in which the customer receives attention at 
more than one counter before he leaves the system. 

Abstracts and short notes are indicated by the letter A and papers with useful lists of 
references by the letter B. Finally, the papers judged to be the most important either by 
reason of their contents or for their survey of a branch of the subject have been marked 


with an asterisk. 


This bibliography was prepared in the Research Techniques Unit of the London School 
of Economics under the direction of Prof. M. G. Kendall and Dr F. G. Foster. The work was 
supported from a grant made to the Unit by the Department of Scientific and Industrial 
Research. The author wishes to thank all those who were of assistance to her in locating 
papers and in particular Dip. Ing. R. Syski for letting her see a pre-publication copy of the 
bibliography prepared by him and Dr J. W. Cohen, and the Royal Statistical Society for 
permission to cite material to appear in subsequent issues of the Journal of the Society, 
Series B. 
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Studies in the history of probability and statistics 
VI. A note on the early solutions of the problem of the duration of play 


By A. R. THATCHER 


It is now just 300 years since the publication by Huygens of the first result on the famous problem which 
became known as the Duration of Play. The aim of this note is to summarize the early development of 
this problem and to show how easily some of the solutions found at the beginning of the eighteenth 
century can be linked with modern work on sequential tests, random walks and certain storage problems. 
We use throughout the following notation. Call the two players A and B, and let their chances of 
Winning a game be pandg =1—?, respectively. A starts with a counters and B starts with b counters, and 
after each game the loser hands one counter to the winner. It is desired to find first the probability P, 
that A will eventually lose all his counters without having previously won all B's, and more generally the 
t ; games. P, and P, „ are defined similarly. Fan +P», n iS 


Probability P,,,, that this will happen within 7 ee 
the probability that the play will terminate (with the ‘ruin’ of one of the players) within » games. It can 


© shown that the play must end sooner or later, so that P,+P, = 1. . ] 
In 1657 Huygens gave without proof, in the fifth and last problem of his treatise De ratiociniis in ludo 
aleae, the nüsneriosl value for P, in a case where a — b = 12 and where p and q had particular values. 
e general result for P, was found by James Bernoulli, who died in 1705, but it remained in manuscript 
Until it was published 8 years later in his Ars Conjectandi; Bernoulli says that the proof is laborious and 
leaves it to the reader. Before the Ars Conjectandi appeared, however, de Moivre had found a simple 
derivation independently and published it in his treatise De M ensura Sortis (1711). ; , 

De Moivre’s original proof, which was later reproduced in his Doctrine of Chances (see 1711, pp. 227-8; 
1718, pp. 23-4; 1738, pp. 45-6; 1756, pp- 52-3), is very ingenious and so much shorter than the demon- 
Strations usually given in modern textbooks that it is worth quoting. Its essence is as follows. Imagine 
that each player starts with his counters before him in & pile, and that nominal values are assigned to the 
Counters in the following manner. A's bottom counter is given the nominal value gis the cu » given 

© nominal value (q/p)?, and so on until his top counter which has the nominal value (a/p) . B's top 

1 ds until his bottom counter which is valued (q/p)**". After 


Coun, $ lownwar' 
ter is valued (g/p)^**, and so on Co and it is always the top 


Sach uy is transferred to the top of the winner's pile, y 
oes the E a oe ae t game. Then in terms of the nominal values B's stake is always q/p 


Count, HET a ox u : 
Ee as ded pee tbe diis player's nominal expectation is nil. This remains true throughout 
the play; ib rs : a ae of winning all B’s counters, multiplied by his nominal gain if he does so, 
; therefor 


must, equal B’s chance multiplied by B's nominal gain. Thus 


a ab q qV gis 
g eg A 
4 +s ise p 
n((.) p P oe MA 
he u is i immediately 
se of P, +P, = ] now gives r : apt- a 
> (g]py*-1 


of the ‘gambler’s ne ; i P., while his expectation per game isp—q. 
n te 's total expected gain 18 bP, — aba 3 by de Moi haa 
These ae of the counters d : "un sented cases of a more general rait cw As AERE 25 
p br e e E De Moivre does not actually divide p y 
the, 35-6; 1738, pp. 48-9; 1756, pP. 55-6). 28 game times the expected number of 


; ectation per 
B T, but, since the total expectation equals the exp “t the expected number of games 
°8, this division is all that is required in order to g 


ani Res 
d this is the probability 


ga aren, (2) 
BIN) = -p4 

i ral 

fae Moivre was also the first to discover and pub a games. 

bh ng the chance that the play would terminate with 


7*7 0) and 5...) is odd, he found 
P,a = first }(n—b +1) terms of (p 4 


thod for calculating P, +P», n thus 


me ; 
e where a is infinite (so that 


For the cas 


y^ 4- first }(n—b+1) terms of (p/a)? (q+ p)"- (3) 
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; ; ; imi se where x — b is even, was given without proofin his De ers 
Kirmes Mp eme OMA (1711, p. 262; 1718, pp. 119-20; 1738, p. 179; 1756, pp. 208-9) 
Series a " Fina drawn attention to this result and also provided a simple and elegant proof. finite 
ms first solution of the general problem of calculating Pa, n + P, n when both a ad see pate 
(17 n p- 261; 1718, pp. 113-14; 1738, stated incorrectly on pp. 173-4; 1756, p. 203) ig miens io s 
f rants) of n — 1 multiplications and the rejection of certain terms during the process. F oaan 
oe calculation is not so tedious as appears at first sight, and it has the advantage of giving th albo Be 
ce to the smallest number of terms; as de Moivre later pointed out, the rejected terms can als 
tain P, n and P, , separately. i TS 
B cm a fee onthe before de Moivre’s method actually appeared (for the Philosoph - A pet 
tions for 1711 were delayed in the press), a different solution giving P, ,, and D, r separate AR it 
found and was soon published by de Montmort (1713). This result is of particular interest ^ 
provides one of the easiest solutions of the problem, since t 
modern tables is rapidly convergent over the ran 
In 1710 de Montmort found a method for cal 
numerical results to John Bernoulli, 
26 February 1711, published by de 
proof the general solution for the ca 


he series which can be derived from it by kis 
ge of values of n where the play is likely to epe 
culating P, , and P, , for the case p = q. He "p data 
who passed the letter to his nephew Nicholas. In a rep! S ait 
Montmort (1713, P- 308 et seq.), Nicholas Bernoulli gave wt 

Se pq; in modern notation it can be written as follows: 


| 
B I X bn (;) (pn-6-16-igt $ qme) 
i 
4) 
= b [engen > (") (p^ 7b-8ti-2a-iqi p graan) " ( 
t i 
fe 


z PE curly 
In this formula s = a +b; the summation over i> 0 continues until the terms in the series m — nid 
bracket, re-arranged in descending powers of p, meet in the middle (the middle term counting only 


i; T : nents 
if n—b is even); and the summation over ¢ covers all values >0 which leave non-negative ange in 
within the summation over i on the line concerned. Bernoulli stated the result for n — b even, 

fact (4) is also valid if n — 5 is odd. 


à rrect 
Not content with this, Nicholas Bernoulli confirmed that the limit of (4) as n> co gives the co 


2453 only 
value for P,. He does not give his method but it is not difficult to guess; if for example p> q it 18 
necessary to re-write the two lines of (4) as 


Spats + np"-lg T case prlstogn iato] (3) 
P Zip togara + np”-iq : + pitestaston-2ti-2a—]. 
As n.— co the sums in ea 


which at the time had not been published but which was known to Nichola: 


zi rem, 
James) Bernoulli’s Theo 


" duces 
s. The expression thus v that 


; i - In passing, it may be note 
88 a > o the expression (4) reduces to de Moivre’s expression (3). +. was 
When de Montmort sa is e inar ion he admitted that he could not follow it (this w? 
partly because Bernoulli i s symbol in two senses), and remarked: 
formule m'étonne pour sa generalité’ (1713, p. 316). Later, in comparing it with his own, he said: 
eu en vüe que la supposition des hazards égaux pour l'un et pour l'autre Joueur, au lieu que VO 


supposés dans un rapport quelconque’ (1713, p.345). De Montmort's solution, which he then descr 
briefly, consisted of a method of picking out the binomial 
was of course sufficient when p 


seems clear that the solution (4 
in fact found first by Nicholas 
De Montmort reproduced 
though far from rigorous de 
handsom’ (1718, p. 122), bu 


ibes 
: ; this 

coefficients in (4) from Pascal's iiir 

arkable result to have found. Neverthe w 


— q, and was in itself a rem 
) of the general case pq, 
Bernoulli, 


(4) in the body of his book, gave an example and added a most inter! 
monstration (1713, pp. 268~ 


though often described as de Montmort's* 


esting 


€ very 
72). De Moivre at first called the result irelY 


s Bernoulli; p 
pursue the matter because his own result gave P, 


aller number of terms. "m 

her ways, and in the course seni 

he first to explore). His results in! athe 
756, pp. 224-7); he foun 


ork 


mu 
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probability of runs of successes (1738, pp. 243-8; 1756, pp. 254-9), and of course made the original 
derivation of the normal distribution (1738, pp. 235-43; 1756, pp. 243-50). On the Duration of Play 
problem itself he expressed P, „ as a recurring series with fewer terms than (4); and finally he discovered 
the first results on the trigonometrical solution (see Feller, 1950, p. 292, equation 5-7), including the 
asymptotic form for P,,, when a = b and p = q. For fuller details of his work, and of its subsequent 
development by Laplace and many others, the reader is referred to Todhunter (1865) and Fieller (1931). 

It remains to show the link between these early solutions and modern work. This stems from the well- 
known fact that the Duration of Play situation can be regarded as a linear random walk with two absorb- 
ing barriers, such that the movement of the particle at each jump has a distribution with mean jJ = p—q 
and variance g? = 4pq. To complete the comparison a simple approximation is required, namely 


(piq) = exp (224/07), (6) 


which ean be shown to apply with sufficient accuracy in the cases for which it will be required. 

If then in equations (1) and (2) we make the substitutions (6) and p — q = 4 we shall obtain approxima- 
tions for the probability of absorption at a given barrier, and for the expected number of steps before 
absorption at either barrier, in the corresponding random walk ; and under the conditions of the central 
limit theorem these will be valid for all walks with given finite and øg, provided that the number of steps 
is sufficiently large. It can be seen by inspection that the transformed version of equations (1) and (2) are 
in fact the same as Wald's approximations for the operating characteristic and average sample number of 
a sequential test, in the form quoted by Page (1954, equations 5, 7). 

We ean similarly transform (3), making the normal approximation to the binomial expressions; it will 
be found that the result. agrees with that given by Bartlett (1946, equation 8), obtained as the solution of 
a differential equation for the diffusion process. It is of interest to note that the same result can also be 
used to find a quick approximate solution of a storage problem considered in a recent paper by Anis 
(1956). This concerns a reservoir, of unlimited capacity, which has initial water level v; this level varies 
each year by an amount distributed with zero mean and unit variance. When n and x are sufficiently 
large wo can ignore the end-effects and assume that the probability that the reservoir will run dry within 
^ years is approximately the same as the probability that B willlose b = x counters within n trials (where 
a is infinite and p = q = 4). By de Moivre's result (3) this probability will be twice the sum of the first 
3(n —a +1) terms of (3 4- 3)". Hence, for large n and . the probability that the reservoir will not run dry 


vn . Rent uS 
Within n years can be expressed approximately as 2 e-1*/ (27) dt, and it is easy to verify that 


0 

this distribution has the same moment ratios as the limiting values found by Anis. 

Finally, we come to Nicholas Bernoulli’s general solution of the Duration of Play. If for any value of 
t either line of (4) is arranged in descending powers of p, it will be found to be the sum of multiples of 
two binomial expressions in the same way as (3)—see also Fieller (1931, equation 10.1 ). who proceeds to 
Obtain the exact solution of the problem in a convenient form as a series of multiples of incomplete 
beta-functions, and also provides a rigorous proof. 

The application of (6) and the normal approximation to 


e-!" dæ; this series agrees with the 
In view of the usefulness of this series it is worth repeating here 


the binomial puts the solution in the simple 


(exact) result given by Bartlett (1946, 


approximate form LA Í 2 
Dag NOM) 


equation 17) for the diffusion process. 
for completeness 


P, n = F(b) -w( — a) F(b + 2a) + w(—a—b) F(3b + 2a) — «X — 2a — b) F(3b -- 4a) +... (7) 
A pen A pean 
wth razh A.-M") eomo (7 c '), 
o ı » 
E -t dr, 
ens f JD 


w(A) &exp (2944/0?). 

The corresponding series for P, n is found by interchanging a with b and changing the sign of y in the 
definiti is lik i 

Tt sg eens (7) converges rapidly over the range oin where the process Led HO PCRIEIIBNG, 

and so (a , d by Bartlett) provides a rapid approximation for the probability that a particle 

F s suggested by mean jt and variance g°, will reach x = b (without 


starting at the origin, with a jump distribution having an 
seri enea num best atx = —a)within n jumps. It can similarly be used to find the chance 


that a linear sequential test will end within n trials, or that à finite reservoir with random net input will 


either dry up or overflow within a given time. 


als. 
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Optimal sampling for quota fulfilment 


By N. L. JOHNSON 
University College London 


1. The problem to be discussed in this paper arises in the following way. It is desired to obtain 
a sample from a stratified population in such a way that there are exactly m, individuals from stratum 
wili = 1,...,k). It is more convenient to take a random sample from the whole population, and is 
ascertain subsequently the strata to which the chosen individuals belong, than to search for individuals 
belonging to specified strata. Therefore, a first sample of N individuals is chosen without regard to 
Stratification and any shortfall is made up by a further set of samples, each restricted to one of the 
deficient strata, and of such a size as to provide the required number of individuals from each of the 


strata. Thus, if the first sample of N contains n;( < m;) individuals from stratum w, then the subsequent 
sample from this stratum will contain 7 


7; — n; individuals but if n; 2 mj, no subsequent sample from this 
stratum will be required, 
If c is the cost per individual in the first ( 


estricted) sample, and c, the cost per individual for 
2 sample restricted to stratum Wi, then the ex 


cted cost of obtaining the complete sample is 
k 1) 
C, =cN+ Me, &(m;—n;|n;«m;) Pr(n, «m, ( 
i-1 


where ; is the number of observati 


ons included in w; in the first unre: 
individuals in stratum w; with numb 


d 
stricted sample. If the unus 
ers in excess of requirements are w 


s 
orth c; each the expected cost ! 
k 
C,—cN 4 2Z [e; &(m; —n, | ni& m;) Pr(n; « mj +e} &(m,—n, |n, zm)Pr(n;zmj]. (2) 
im 
C, can, of course, be regarded as a special case of C,. 


2. If itissupposed that thejoint distribution of 73, Nay 
(as would be appropriate if 
Wy, 05, 


i + nyis multinomial with parameters p,, p» + vs Pk 
sampling from a large popul; 


: 5 : rat 
, i a ation with proportions p}, Pa ..., p, in strat 
10295, respectively, were being considered) then 


é(m;—n) = m;— Np, 
and (2) can be written 


k k 
€, = oN + > (ci— ci) (mi — n; | n; & m;) Prini<m}+ > ci(m;—Np;). (2a) 
= i-l ' 
m-1 N N pt 
Using Gruder's formula M (r- Np) ( ) prow -—m pgN-ma 
r=0 r m 3 LE 


C, can be expressed in the form 


k m-1l/N E 
C, = cN + X (e; ej) [ m8) x ( jr em (P) rome F Sena Np) (3) 
i-1 j=0 M m; A ixi e 


| 
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If N is large enough and | m; — Np; | (Np;q;)-! is not too large, approximate expressions for the expected 
cost can be obtained by putting 


é(m;—n; | n; < m;) Pr(n; < mj = (Np-q,)) (X (X) + A(X), 


m;— Np; 1 - 1 fX 
"here US pee et bx XS TR) = -4 
where =a Z(X;) K ; NX, 7 "qt. 
k k 
Then C,cN + D (i76) QN pia (XX) + 2X) + X ci(mi—Np;). (4) 


i=1 i=l 
Approximate values of N minimizing C, can be found by equating the differential coefficients of (4), 
with regard to N, to zero. The resulting equation is 


ND 
Mei e (4) ZX) = e (X) ed - IX) (5) 
NP: 

3. However, an exact approach leads to a simpler solution in the present instance. If N is increased by 
unity, the probability that the extra observation comes from w; is p; The cost of obtaining the extra 
observation is c, but there is an amount c, to set against this if w; is a deficient stratum or c; if the first 
sample contains m; or more individuals from w;. Hence, the change in the expected cost is 

AC, = c— Y pi[(c;— ci) Prin; < mj + ci]. (6) 
i=1 
Optimal values of N are obtained as the least value of N for which AC,>0. These optimal values will 
approximately satisfy the equation AC, = 0, i.e. 


k (c;—cppi 
S (Ci Pt Prin, em) = 1, (7) 
$ D p; 

j=1 

k Li 
gi Pr(n; «mj = l, (7a) 

=1 


x 


i=l, 
or 
i 


(ede) -(clje) 


g = T is 
1- 3 (ere) py 


j=l 
If cf = 0; gi = (cilo)pi- 


where 


In the special case of complete symmetry where 
p= Wks ofo= as cfe= as m,—m 


then g; = is -A and (7a) becomes 


1 1-d 
Prin; <m} = m = ang (8) 


4. Table 1 gives optimal values of N for this symmetrical case for 


k = 2(1)10, 


m = nearest integer to 50k, 100k, 200k, 500-1, 


c/c 2 d — 1-25, 1-5, 2-0, 2-5, 3-0, 
cije = d' =0°9, 0-7, 0-28, 0. 
less than one, were chosen for the following reasons. If any 
btain the required sample of m; from those strata for which 


Cxcb tricted sampling, and then to use the optimal N for the remaining strata. Similarly, if all 
dw y restric 3 


vi imi N increases. 
4 1 > ted t decreases without limit as. l 
uw Cis are Brepter is m Vies Sae sro by using the tables of the incomplete beta-function (ed. 
K Thea he 3 en as the range of argument of these tables extended. This method could be used 


hs d g^ mes d for some values of d and d" when km — 100 and 
fornearly all combinations of d and d' when km = 50 an 


‘ 


Values of ¢,/¢ ter than one and of cile 
¢,’s are less ku, eae to) cit is best to o 
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3 x . he 
k = 2. For higher values of km (the total size of the required sample) caleulations were based on t 
approximate formula iu-BD-N. 


NES I Aue 0) 
(N(k—1))8 
1 Aa " 
where A is defined by x]. e-M dt = a. 
Equation (9) leads to N zk(n — 9) [0 y2)! y], (10) 
k-1 M 
where y- Pup) : 


As the size, km, of the required sample increases, y tends to zero and 


1 
Nw Rm — 4) (1 — 2y + 2y2) = km — 3) —Agyugh (1) (on — 9) + JAB yal 0). uy 
This formula gives values for N which 
with reasonable confidence when km z 1 , EE 
Table 1 shows that the optimal size N for the first sample increases with d and with d , 8s mig ste 
expected on intuitive grounds. For % = 2 the optimum N differs only slightly from the required samp 


size km, but as k increases the variation in optimal N with d and d' becomes more pronounced. 
The minimized value of C, is 


z sed 
are not in error by more than one when $m z 50 and can be use 
00 provided £ is not too large. 


Camin) = e[N 4- k(d — d^) £(m—n, |n; &m) Prin; <m} + kd'(m — (GN /K))]. 


Since N is chosen to satisfy approximately the equation 
Prín; «m) = (1—d')/(à— d^), (8 bis) 


Comm oN + E(1 — d^) (m — (NJ) — £(n, — (N/k) | nj & m)) + kd'(m — (N /k))], 


y N q 
Comin = Z t (d —d') (7) (k— gangen | e 


Comin) , 1 d' INE. PN! " (12) 
meka ^ d* 6-2) (1-3) A (=en 

Values of this ratio are given in Table 2 (calculated by Miss E. 
k, m, d and d’ 


k = 2,3,4, 5, and 10, 


J. Smith) for the following values of 


m = nearest integer to 50k-1, 100k-1, 5001-1, 
d — 1-5, 2-5, 3-0, 
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Table 1 
— ] 
E SuHUNMMNEERSMT-- 
kb inp 135| 15 | 20 | 25 | $0 | x | Sample | 125| 1:5 | 20 | 25 | 3-0 
(km) d (km) g | 
= 
2 50 0-9 53| 56| 59| 61| 62| 5 50 5 
2 o9 | 56| 62| 67) 7 
0-7 48 | 51 54| 56] 58 0-7 47 53 Ei a2 M 
0-25 44 47 50 52 53 0-25 39 45 5i 54 57 
0 43| 46| 49| 51| 52 0 37| 42| 48| 52] 54 
100 o9 | 106 | 109 | 113 | 116 117 100 0-9 | 109 | 118 | 127 | 132 | 136 
0-7 | 98 | 102 | 107 | 110 | 111 07 | 95 |104 | 113 | 118 | 123 
0.33 | 93 | 97 | 101 | 103 | 105 035 | 85 | 93 101 | 106 | 110 
0 91| 95| 99 | 102 | 103 0 82| 89| 97 | 103 | 106 
200 09 | 207 | 213 | 219 | 222 | 224 200 | 09 |214 | 227 | 235 | 245 | 250 
Q7 | 198 | 204 | 210 | 213 | 216 | O7 |195|207 219 | 227 | 232 
0-25 | 190 | 196 | 202 | 205 | 208 0-25 | 181 | 191 | 203 | 210 | 215 
0 187 | 193 | 199 | 203 | 205 0 | 177 | 186 | 197 | 205 | 210 
500 0-9 | 512 | 521 | 530 | 534 | 538 500 0-9 | 523 | 543 | 561 | 571 | 578 
0-7 | 497 | 506 | 516 | 521 | 525 0-7 | 493 | 512 | 531 | 543 | 551 
0-25 | 483 | 493 | 503 | 509 | 513 0-25 | 468 | 486 | 506 | 517 | 524 
Q | 481 | 489 | 499 | 505 | 509 Q | 461 | 479 | 497 | 509 | 517 
T1 
3 51 09 | 55| 60| 65| 67, 69] 6 48 09 | 55| 61) 68| 73) 77 
o7 | 49| 53| 57| 60| 62 o7 | 44| 51| 58| 61) 63 
025| 43| 47| 51| 54| 56 0-25| 36| 42| 49| 53| 55 
0 42| 45| 49| 52| 54 0 34| 39| 46| 50| 53 
99 09 |106 | 111 | 118 | 121 | 124 102 09 | 112 | 123 | 133 | 139 | 143 
05 | 96 | 102 | 108 | 111 | 115 07 | 97| 106 | 117 | 123 | 127 
0.25 | S9 | 94| 100 | 104 | 108 0-25 | 85| 94| 103 | 109 | 113 
0 87 | 92| 98 |101 | 104 0 s2| 90 99 105 | 109 
201 0-9 | 213 | 222 | 229 | 234 | 237 | 198 0.9 | 213 | 228 | 241 | 249 | 254 
0-7 | 199 | 208 | 216 | 222 | 227 0-7 | 192| 205 | 219 | 228 | 234 
0.25 | 188 | 196 | 205 | 210 | 213 0-25 | 175 | 187 | 201 | 209 | 215 
0 185 | 193 | 201 | 205 | 208 0 170 | 182 | 195 | 203 | 209 
501 0.9 | 518 | 532 | 543 | 550 | 555 498 09 | 524 | 545 | 566 | 577 | 585 
0-7 |496 | 510 | 524 | 532 | 537 0-7 | 490 | 511 | 533 | 545 | 555 
0-25 | 479 | 492 | 505 | 513 | 519 0-25 | 463 | 483 | 504 | 517 | 526 
o | 474 | 486 | 499 | 508 | 513 0 | 455 | 474 | 495 | 508 | 517 
n! 
t 52 o9 | 58] e3| 67] 69| 71| 7 49 os | s7| 6j m m s 
o7 | 49| 54| 60} 63| 65 o7 | 45) 5 0| 63| 67 
0.25 | 43| 47| 53, 56| 58 025| 36| 43| 50, 54, 58 
0 41| 45| 50} 54| 56 0 34| 40! 46| 51| 54 
.g | 108 | 116 | 123 | 128 | 131 98 0-9 | 109 | 120 | 132 | 138 | 143 
"m 04 96 | 104 | 111 | 116 | 119 07 | 92 | 102 | 114 | 120 | 125 
995 | 87| 94| 101 | 106 | 109 035 | 80) 89| 99105, 110 
0 85 91 98 | 103 | 106 0 77 85 94 | 101 | 105 
203 0.9 | 221 | 237 | 253 | 262 | 2 
200 o9 | 212 | 223 oa Fi | 238 07 | 196 | 211 | 228 | 237 oad 
pv | a 205 | 203 | 209 | 213 0-25 | 177 | 191 | 206 | 216 | 222 
0:25 | Z9 | 188 | 198 | 204 | 209 0 |171 | 185 199 | 209 | 216 
EMT 497 0-9 | 531 | 555 | 578 | 590 
500 09 | 520 | 537 | 552 | 561 ae ‘ 07 | 494 | 517 | 541 | 555 566 
035 48 be 305 815 | 522 0-25 | 464 | 486 | 509 | 524 | 534 
Q | 467 | 482 | 498 | 508 | 515 O | 456 | 477 | 499 | 514 | 524 
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Table 1 (cont.) 
: 
Required | a | | Required | d 
sample 125| L5 | 20 | 25 | 30 | p | sample 1-25 
size " Size " 
(km) e (km) d 
| 
48 09 | 56| 64] 73| 79| 83] 9 198 | 09 | 217 
0-7 | 43| 51] 59| 64| 68 | 0-7 | 189 
0:35 | 34| 40| 48| 53] 56 | 0-25 | 169 
0 31| 38| 45| 50| 53 0 163 
| 
104 0-9 | 116 | 129 | 142 | 149 | 154 504 0-9 | 537 
0-7 | 97 | 109 | 121 | 129 | 134 0-7 | 493 
0.95 | 84| 94) 105 | 112 | 117 0-25 | 459 
0 80 | 89 | 100 | 107 | 111 | 0 449 
j 
200 0-9 | 228 | 235 | 252 | 261 | 268 x 
0-7 | 192 | 208 | 225 | 235 | 243 | 19 50 i 5 
0-95 | 173 | 187 | 203 | 213 | 220 ilH 
0 167 | 178 | 196 | 206 | 213 0-95| 34 
504 0-9 | 533 | 561 | 585 | 599 | 608 | 9 a 
0-7 | 494 | 519 | 545 | 561 | 572 à 
0-25 | 462 | 485 | 511 | 526 | 537 "d ia E^ 
0 453 | 475 | 500 | 515 | 526 0-95 | 73 
0 67 
54 09 | 62| 73| 83| 89| 94 200 0-9 | 220 
0-7 | 49| 56| 66| 73| 77 0-7 | 191 
0-25 | 38 | 46] 55| 59| 63 ; 
0-25 | 169 
0 36| 43| 51| 56| 59 0 163 
99 0-9 | 111 | 125 | 188 | 146 | 152 500 0-9 | 534 
07 | 92 104 | 117 | 125 | 130 0-7 | 489 
0-25 | 77| 88 | 100 | 107 | 113 0-25 | 451 
0 74 | 84| 94 | 102 | 107 0 442 


Ee 


Table 2. Ratio of expected cost of optimal sample to cost of restricted sampling 
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d 
k m dad’ 
1-5 2-5 3-0 

2 25 0-5 0-704 0-437 0-369 
0:1 0-711 0-451 0-382 
0-0 0-716 0-455 0-386 
2 50 0:5 0-693 0-426 0-357 
0:1 0-700 0-437 0-368 
0-0 0-703 0-439 0:371 
2 250 0-5 0-679 0-412 0-344 
0:1 0-682 0-416 0-348 
0-0 0-683 0-417 0-350 
3 17 0-5 0:719 0-452 0:383 
0-1 0-733 0:473 0-403 
0-0 0-733 0-478 0-408 
3 33 0-5 0-704 0-438 0-368 
0:1 0.717 0-453 0-382 
0-0 0-717 0-456 0-386 
3 167 0:5 0-682 0-416 0-348 
0-1 0-688 0-423 0:355 
0-0 0-689 0-425 0-357 
4 13 0:5 0-730 0:465 0-386 
0:1 0-751 0:490 0-417 
0-0 0-749 0-493 0-423 
4 25 0-5 0-712 0-446 0:377 
0-1 0-728 0-464 0-394 
0-0 0-727 0-468 0-398 
4 125 0:5 0-687 0-420 0-352 
0:1 0-694. 0-428 0-360 
0-0 0-694 0-430 0-362 
5 10 0-5 0-741 0-476 0-404 
0:1 0-760 0-504 0:434 
0-0 0:762 0-510 0-440 
5 20 0:5 0:719 0-455 0:382 
0:1 0-732 0-474 0-404 
0-0 0-735 0-479 0-409 
5 100 0-5 0-690 0-424 0-355 
d 0-1 0-697 0-433 0-365 
0-0 0-699 0-435 0:366 
10 5 0-5 0-786 0-515 0-445 
0-1 0-801 0-558 0-487 
0-0 0-806 0-565 0-493 
10 10 0-5 0:745 0-482 0-409 
0-1 0:763 0-512 0-441 
0-0 0-767 0-518 0-448 
10 50 0-5 0-702 0-442 0-366 
0:1 0-711 0-450 0-380 
1 0-0 0:714 0-453 0:383 
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The distribution of intervals between Successive maxima 
in a series of random numbers 


Bv D. S. PALMER 
Marconi's Wireless Telegraph Co., Ltd 


P : p rs 
Let u(r) be the rth in order of occurrence of a series of independent random numbers. Then +3 — 
u(—1), u(0), ..., u(n -- 1) can occur in (n 4-3)! different orders when placed in ascending order of magn 


,butno further maxima, although 
(n t 1) in position m from the bottom. Then 


43 
Q(n) = San) (1) 


n= 
Consider the Q(n) orders, 
an order of the group Q(n+ 


(A) 


ke 
and the positions in which a further number u(n + 2) may be placed to ma 
1). Four cases must be distinguished 


u(n+1)<u(n), u(n+ 1) <u — 1). 


u(n+1) must have the lowest place and u(—1) ma 
u( — 1)... u(n 4- 1). Hence q(1,n) = n+l. u(n4-2) may 
each of q(1, s + 1), q(2,n +1) e Qr 4n 1). 


(B) 


j j f 
y have any of n+1 places in the anra es 
have any place, so this case contributes n + 


u(n+1)<u(n), u(n--1)-u(— 1). 


Only one ordering of u(— 1)...u(n 4-1) is possible, and «(n 4-2 
à contribution of 1 to q(1,14- 1), q(2,n 4- 1), ..., q(n 4- 4,n 4- 1). 


(C) 


) may have any place. There arises 


un+1)>u(n) and u(n +1) has second place. 
There are n possible positions of u 


(— 1) with regard to u(0)... U(n+1). u(n4- 2) may have any place but 
the two lowest, There arises a contribution of n to q(m,n + 1) for each m9. 


u(— 1) ...u(n4-1) belong to q(r,n) (r2 2). 


"(n 4-2) must > "un 4- 1), contributing g(r, n) to each q(m, n 4- 1) with m. 
From cases (A), (B), (C) it follows that 


d(1,n) = q(2,n) 2 n 4.1. w 
Summarizing the contributions to q(m, n+ 1) we have 


antn lat) g(m,n +1) 


e 


—MÓÀ 
(A) n+l n+l n+l 

(B) 1 1 1 

(C) | 0 0 

(D) | 0 0 


q(r,n) (m> r) 


n (m> 2) | 


Therefore q(1, N+1)=9(2,n4 1)=n+2, agreeing with (2) and 


m~1 i 
q(m,n- 1) = 2(n 1) 4 E ar,n). 
Assume that r=3 


1—3 
q(m,n) = Qm-2 (^-^. 3<m<n+3. " 


pe 
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m—1 = 
Then qm, n+ 1) = 2(n+1)+ * art (»- 3) 
r=3 2 
3\ m—1 17-1 
= 2041+ (ns) D 2n4—— J rx- 
“/ r=3 2 753 
m—4 m4 
= 2(n+1)+(2n4+3) XM V- X (r+3)2" 
r=0 r=0 


m-—4 m—4 
= An+1)+2n Y V- Y rer 


r=0 r=0 
= 2(n+1)+2n(2™-3— 1) —(m—5) 2n3— 2 
m-—3 


3 


= an-t(n Spes ) agreeing with (4). 


(4) is valid for n = 1 and 2 as may be seen by the following enumeration. Orders for which u(n + 1) < u(n) 
are enclosed in brackets. 
n=1 q(1,1)=2 (2431), (3421), 
q(2,1)22 (1432), 3412, 
q(3,1) 2 2 2413, 1423, 
q(4,1) 22 1324, 2314. 
n=2 q(1,2)=3 (25431), (45321), (35421), 
(2,2) =3 (15432), 45312, 35412, 
q(3,2) 2 4 15423, 25418, 45123, 45213, 
q(4,2) = 6 15234, 15324, 25134, 25314, 35124, 35214, 
q(5,2) = 8 13245, 14325, 14235, 23145, 24136, 24315, 34125, 34215. 


Thus the validity of (4) is proved by induction. 
n3 


By (3) q(n 4,n4- 1) = 2(n-1)4- yarn) 
i= 
n+3 
= M (rn) by (2) 
r=1 
= Q(n) by (1). 
Hence, by (4), Q(n) = 2**(n4- 1— Het 1)) = 2+ 1). (5) 


Of the (n +3)! orders of u(— 1) ... u(n + 1), one-third satisfy the condition that u(— 1) «(0)» «(1). 


Eisen P(n) = chance of an interval of n between successive maxima 
= Q(n—1)/}(n+ 2)1 — Q)/1 3)! 
3a2"  3(m-1)2""7 5545. p(2)(n43) (RE 2). (6) 


~ (n+2)! (n+3)! 
Then for the moments we have 
ao 
Ele”) 23 Y, e"2"(n—1) (n+ 2)/(n+3)! 
n=2 
= e {fet — 3e + $e] — Be? + get+1 


= 143t+($e2-6) P+... (7) 


The first two terms are obviously correct, and the third gives 


s.p. = (3e?— 21)? = 1-080. : 
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The first 2000 two-figure random numbers in Kendall and Babington Smith's Tables (Tracts for 
Computers No. XXIV) give 


| | 

n P(n) | Expected Observed at 
2 2/5 266-0 251 0-85 
3 13 | 292017 241 171 
4 6/35 114-0 107 0-43 
5 1/15 44-3 50 0:73 

z6 1/35 19-0 16 0-47 

665 665 


Mean 3:01, s.p. 1-04, y? (4 d.f.) = 4-19. 


Iam indebted to Marconi's Wireless Telegraph Co., Ltd. for permission to publish this note. 


The effect of ties on the moments of rank criteria 


Bv B. E. COOPER 
University College, London 


1 1 
Ag = [ne Tp t DY (pa 1yeay 


—1 TH r+ tO pq m 1) 
Es alr Ha, _{i-t ( 
esu j ) " ( 2 ) ( 2 ) |. 
Now suppose there are T groups of ties with t, in the ith grou which ND. for 
1$21,2,..., T. The changes in the moments dije oup ere aien has common rank vit M 


to each group are additi ; ittle reduction: 
the population central moments are seen to be 3 "ve, audio, after a.i 
7 NH EM " PM 
Pam * Fa [NUN -0-Eud-1) 
x] m 
== 8. (2 
D 1N Birtles 1), 


mod N(N2-1 2 C rp 2 12 
M = gig (NW? 1) (N27) — Y; Mi- n ad-7)- 3. S atu—1), 
= 2N i1 


« We give the moments rather than the K. “Statistics since these former are simpler. 


y 
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3. If it is assumed that two samples n, and n, are arranged together in ascending order of magnitude, 
ON = n, +n), then Wileoxon's procedure consists of finding the mean of the ranks of one sample, and 
using this criterion to test for the equivalence of the location parameters of the two populations 
generating the samples. The moments of the mean of the first sample (ties not admitted), may be 
written down immediately from Wishart's tables. When ties are allowed, the first moment is unchanged, 


and the second moment becomes 


ao mO-n)N-TDD,.- 1 A. us 
pta) = E 1 xo 247] (3) 


The third and fourth moments do not simplify at all and they are accordingly most simply computed 
al values of (2) into Wishart's formulae. 

ve has previously been derived by Hemelrijk (1952), Kruskal (1952) 
and Putter (1955). Computations were carried out for varying values of N and n,, and for different 
numbers of tied observations. It would appear for such values of n, and N for which the normal approxi- 
mation can be used in significance tests that the effect of ties is not likely to lead to serious error, provided 
that not more than half the observations are tied. 


4. Other procedures have been put forward whe 


in any instance by putting numeric 
The variance of z given in (3) abo 


reby tests can be carried out for possible differences in 


the parameters of dispersion of two parent populations. David (1956) suggested the variance of one 
sample might be used, while Mood (1954) proposed W, the sum of the squares of the deviations of the 
ranks of one sample from the population mean. It is clear that the same algebraic procedure used above 
for the sample mean can be used to derive corrections to the moments of both David’s and Mood’s 
criterion when ties are allowed. In fact these two criteria are merely the sample variance and the sample 
sum of squares about 4(N + 1) of a sample from the finite populations whose first four moments are given 


by (2). 
Thus using Abdel-Aty’s (1954) general formulae, the first two moments of Mood’s statistic become 


T 
sW) = ig xon-n- X «it- n), 


m (N-n 


var(w) = “> 2 i, - i3. 


For David's statistic, kọ we have similarly, 


i 1 7 E op 
49 = N {wove n= Ete D}, 

O N(N - n) (Nm -N —m 1) 0 — Dpat GUY +N 9) 4, 
vast) = mon — 1) (N= 1)2(N— 2) (N= 3) 


s of jj, and 4, from (2) are 


se statistics do not simplify if the explicit value: o 
using the numerical values 


The variances of both the: Me aps 
in any given instance, 


substituted. It is simplest therefore to compute them, 
obtained from (2). 

5. Although the correction to the variance of the 
writer knows there has been no discussion of corrections for } 
criteria. The general expression in (1) enables as many correcte 
be calculated, as in (2). Accordingly the only limitations are set 


Aty’s tables. 


mean has been given by several persons, as far as the 
higher moments of the mean, or for other 
d moments of the population as desired to 
by the extent of Wishart's and Abdel- 
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Approximations to the upper 5 % points of Fisher’s B distribution 
and non-central y?* 


Bv JOHN W. TUKEY 
Princeton. University 


l. Summary i " 
Some time ago the writer reported (1949) on an empirical approximation to the upper 5% ome 
Fisher’s B distribution, the distribution of the square root of anon-central X’, comparing the EE ocn 
tion with certain of the exact values given by Fisher (1928) and suggesting that the empirical pee 
might *be useful for moderate extrapolation’. At about the same time Patnaik (1949), and later 2 rio 
Aty (1954), studied the problem of approximating the distribution of non-central y?. Through the able 
and courtesy of Dr B. I. Harley, of University College, it is now possible to present a comparative oe 
of the results of various approximations, including the streamlined use of moment-fitted Pearson cur 
by the aid of Table 42 of the new Biometrika tables (Pearson & Hartley, 1954). stting 

While the empirical approximation is restricted to the upper 5% point, only the moment-fi 


approach seems likely to give more accurate values in the range considered. Because of its simplicity: 
the empirical formula may prove useful. 


2. DISTRIBUTION AND APPROXIMATIONS 
The simplest definition of a non-central y? quantity is 


siteit... patsy (x, +2)? = x = Be, 
where the z's are independent unit normals and À = p? 


Mose isher 
is fixed. In discussing the B distribution, Fishe 
used # and n, = n. The empirical approximation to be 


considered is 


n,—1 


=, - 
Boos% 16449 4- 2 0-51 7 — 0-024 0 — 8) (n, — 1) e 


+1 LU ESTE 


(The simplest way to use this for a 5 96 point of x” is to find DB; and then Square the result.) 
Patnaik's approximation begins with inverse i 


nterpolation in the percentage point tables of x ac 
approximation, pp. 207-8) and then follows with a Cornish-Fisher expansion (second approximation» 
formula (25), page 213). A substantial amount of computation is required. aadi 
i rmation and continues withan exponse ts 
Cornish-Fisher type. The first approximation, that Uf A) = B5 (n, + fj?) is roughly norme 


ot 2 MHRA _ 2 nt Bf 
9(n+A)?~ O(n + p22 
is quite simple, but the ‘closer approximation’ involves considerable co 


The approximation involved in the use of Table 42 o 


a Pearson curve to the first four moments of y'2. 
curve, 


mputation. in£ 
fthenew Biometrika tables corresponds to ntl 
and accepting the Percentage points of the fitted Peat 


3. Accuracy anp COMPARISON 
Fisher’s table covers ^; = 1(1)7 and A= 0 arison of exact and approximate valu! 
a reasonably rude grid is made in Table 1, > Ê = 1(2)5. The disagreeme 
nowhere worse than 0-035 and reaches no more than 0-0037 for £2: 3. The fit is encouraging. " x 
ds of approximation is made in Table 2 ta terms of 207 


aker 
he summary, this table was undert# 


ts 
gh moments and Table 42 require the computation of the first four mome” 


_ 8(1+2b)2 12(1 + 3b) 
ic ae ee 
T=n+A=n,4— and b= Al(n+A) = P*[(n, + f). 
Once these values are available, the use of Tal 
values for a moderate selection of percentage 


es ab 
nt i$ 


central 1? upper 5 % points, squares of B 
and computed by Dr B. I. Harley. 
The values obtained throu 


where 


ximat? 
' lent 

Tracy at the upper 5 % point is excel 

d by the U.S. Office of Naval Research- 


~ 
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Table 1. Accuracy of approximation (A) 


B Value n=l nmn =3 - m=5 n7 
1 Exact 2-6461 3-1941 3-6291 4:0005 
Approx. 2-6449 3-2029 3-6649 4:0309 
3 Exact | 4-6449 4:9055 51512 5:3840 
Approx. 4-6449 4:9079 5-1549 5:3859 
5 Exact 6-6449 6-8162 6-9831 7-1457 
Approx. 6-6449 6-8181 6-9849 7:1453 


Table 2. Approximations to the upper 5% points of the non-central x? distribution 


Abdel-Aty’s 
. | Patnaik's approx. approx. Moments 
n=n, | A=f? | Exact id Hs "s and 
| Table 42 
| Ist | 2nd lst 2nd 
2 1 8:64 8-62 | 8:63 — 8-56 8-38 — 
4 14-64 14-65 | 14-72 14:67 14-66 14-62 T 
16 33-05* | 33-07 33-35 33-06 33-32 33-08 33-07 
25 45:31 45-32 45:66 — 45-64 45:33 — 
3 1 10-20 10-26 10-20 — — — — 
4 1 11-71 11:87 11-72 — 11:67 11-67 — 
4 17:31 17-36 17-38 | 17-33 17.34 17-27 ^ 
16 35-43 35-46 35.60 | 35-42 35-66 35-44 35:42 
25 47-61 47-64 47-94 — 47-91 47-62 — 
7 1 16-00 16:25- | 16-01 — 15-98 1599 | — 
4 21:23 21.32 21:28 21:27 21-25 21-21 21-23 
9 28-99 29-01 29-12 — — — — 
16 38-97 38-97 39-16 38:97 39-16 38-96 | 38-95 
25 51-06 51-06 51-34 — 51-33 51:06 = 


* These lie outside the J, range for Table 42. 


There is some reason to suspect, however, that its accuracy may be close to its best at this particular 

percentage point. N z . 
If other than an upper 5 96 point is desired, the method using Table 42 seems simplest and as likely to 
be accurate as any. For an upper 5 % point, approximation (A) seems simpler and almost as accurate. 
well consider applying Table 42 to Abdel-Aty's cube-root 


(The user demanding highest accuracy may 1 1 
transformation, since the formulas for its f, and f, are given at the foot of his page 538.) 


4. Source or APPROXIMATION (A) 


For large fj, À Á 
B= f? zy haters . 
ce 

whence n - peo) Z f+2,+0(3). 


34-2 
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5 i 5 is fixes the first two terms of (A). The last two 
hi er 5 % point of B — / approaches 1-6449. This fixes t saf (43 pias 
pales E iene of a table of B,4,— 1-6449 — f. Presumably similar e 
i cuim could be obtained by similar methods from brief tables of other exact percentage points. 
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Uniqueness of a result in the theory of accident proneness 


Bv N. L. JOHNSON 
(University College, London) 


ich 
1. A simple type of model for accident proneness (see Greenwood & Yule (1920), Arbous & Kerric 
(1951), Bates & Neyman (1952), and Arbous & Sichel (1954)) is obtained by supposing 


T- h 
(i) the number of accidents sustained by a given individual in a period of fixed (‘unit’) lengt 
(e.g. 1 year) may be represented by a Poisson variable x with 


Az 
Pr{a} = Gi (x = 0,1,2,...), 
and (ii) A (>0) varies from individual to i 


individual with a probability density function p(A). 
If p(A) be assumed to follow the Pearson type III distribution 


p(A) = C) noma (O<A) 


a Av 
Pr(y) = f. e POMA 
= C) rang, 7 e-O-kimA qr 
m] TTI Jo i 
__T(k+y) k M m \r 
~ Te) Py 4-1) (az) (x) i 


i.e. a negative binomial distribution with mean k(m/k) 


-À 
PAu) = APA — 
" € Arp(A) dA 


Pr{z|y} = Js e ya 1y)dA 


e 
8 y Av: o~t2+kImA dÀ 
mU ccr MM NIU 
T1) MEE etki, dÀ 
0 
=_Vkty+z) (mth \ney p; Ne 
See) M 
Tk+y)T(2+1) 3i) (zs) : 


Hence 


Lll T 
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m 


PER and 


The conditional distribution of z, given y, is therefore a negative binomial with mean (k+y) 
variance 


(k+y) 


m i m om(k-r y) (2m 4 Kk) 
m+k mak] —— (m+ k)® 

The regression of z on y is linear (see also Arbous & Sichel (1954)). 

It may also be noted that &(z |y = 0) = mk/(m+k), while &(z) = &(y) = m, so that selection of 
individuals free from accidents in the first period reduces the accident rate to be expected in the second 
period by a factor k/(m + k). 

3. It is the purpose of this note to show that if p(A) is not a type III probability density function, the 
regression of z on y is not linear. 

Since &(z| A) = A it follows that 

eo 
é(z|y) = f. Ap(A | y) da. 
Hence, from (1) 


e e 
(|y) = I e jvpoya | f eA A?p(A) dÀ 
0 0 
and so, for the regression of z on y to be linear, we must have 


eo 
ji Avtte-A p(A) dA 


a+ By = TL à (2) 


i AeA p() dÀ 
0 


where g and £ are constants satisfying « +m = m (since 6(z) = &(y) = m) and //2 0 (since &(z | y) > 0 for 
all y). 
Equation (2) can be rewritten 


jt 
a+ py =, (3) 
My 
where 4; is the rth moment about zero of the probability density function 
e 
eon] f e-^p(A)dÀ (A20). (4) 
0 


Since xó = 1, it follows that 
W=a>0, p =ala+) ác (a B) (a 25). 
and in general 
g! ui = ala +f)... (+ (0— 1) B). 


u< (pi) and so Tim (ur) « f', where f' = «+f. 


Hence E 


e the distribution (4) uniquely; and p(A) is, therefore, also determined 


Hence the moments {g} determin and this is therefore the only form of 


uniquely by (2). A function p(A) of type III form does satisfy (2), 
probability density function satisfying (2). 
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; A. G. & Stoner, H. S. (1954). Biometrika, 41, 77. ; 
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A note on the mean deviation of the binomial distribution 


By N. L. JOHNSON 
University College, London 


In a paper dealing with the mathematical theory of risk in insurance Gruder (1930) has shown that 


n 
* )rete-m =z (reran 
r=m NT. r=m \? 


n n— :) n=l 
= n parte nf jeee] 
X, ie 1 r 
— cae em n-m4l 
a 1]?"4 
1 
(Porn a) 
m 


. : RS H an 
This result may be applied to the evaluation of the mean deviation of a binomial onde 
application which, it is believed, is new. The mean deviation of the binomial distribution is equa 


n [n nDi6-l][a n n 
X ()re |[r-»»|-- Y () p"—(r—»p)-- M ( ) p'q"-'(r np), 
T- 0 


r= r-npt V? 

where np 4- £ is the least integer greater than np. 
i zo[n 

Since x ( earn =0, 
r=0 AT. 


= dm = n 
X )p'g|r-np|=2 X pra"7'(r— np). 
r=0 T r=np+t NT. 
the mean deviation of the binomial distribution is 
n (2) 
2 pnrtt na—€+1_ 
(np + €) om " g ptg 


The ratio of mean deviation to standard deviation is 


Hence, using (1), 


2(npt)[ n , (3) 
R= Wt) "2-1 qni-C44 
vn (oe)? ini i 


Applying Stirling’s formula in the form 


Niz (2g) Neste» + ix) 


12N 
we obtain 
b 
. {2 =) ( I| E)” ey” 1 1 -1 1 ) . 
RS [pae (LA are ER — ^ 
al np nq To ( nq ur ;) (use) (rh 
If np and nq are large (E) "er. . 
n 2np 
(2) eE), 
ng 2ng 
(«Ey 144-9 
np np 
1-5 Uu f-0 
ng, nq 
. j2 & 64-9 1 
d Ræ [=le 40278), 1 n 
€ Yd! 2npq^ npg "lon meria) 
2 


=,/ (Eme). (4) 
7 2npq — l2npq 


sar 
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The table below compares the exact formula (3) with the approximate formula (4) for some typical 
values of n and p. The ratio takes the same value for p = x and p = 1—2,so values of p are restricted to 
the range 0-1 — (0-1) — 0-5. It will benoted that the Normal limiting value of ,/(2/7) = 0-7979is approached 
quite rapidly as n increases. The approximate formula (4) gives nearly three figure accuracy for 
0-2 x p « 0-8, while for n>50 the exact and approximate values agree to four decimal places. 


| 
Pi o1 | 0-2 0-3 04 0-5 
n (3) (4) (3) (4) (3) (4) (3) (4) (3) (4) 
10 | 0.7351 0-7313 | 0-7640 0-7630 | 0-7733 0-7729 | O-T771 0-7768 | 0-7782 07779 
20 | 0.7652 0-7642 | 0-7807 0-7804 | 0-7853 0-7854 | 0-7874 0-7874 | 0-7879 0-7879 
50 0-1846 0-7909 0:7929 0:7937 0:7939 
100 0-7912 0-7949 0-7954 0-7958 0:7959 
REFERENCE 


GRUDER, O. (1930). 9/h International Congress of Actuaries, vol. 1, pi 


A comment on D. V. Lindley’s statistical paradox 


By M. S. BARTLETT 
University of Manchester 


D. V. Lindley (1957), demonstrating the possibility of 


contradiction between the result of a statistical significance test and an assessment of the posterior 
probability of a null hypothesis. I would agree that he establishes the point that one must be cautious 
when using a fixed significance level for testing a null hypothesis irrespective of the size of sample one is 
taking. However, there is a slip, in his expression for K under his equation (1), that appears to me, unless 
corrected, to lead to an overstatement of this point. The prior distribution for 0, given that OOo, was 
assumed to be uniform over an interval J, and hence its density function should be 1/7 in this interval. 
This leads to the extra factor 1/7 in the second term in the expression for K.* This expression then becomes 


consistent with Jeffreys’s equation (10), § 5-0, in his book (second edition, 1948). . : i 
The occurrence of J in the formula for the posterior probability é of the null hypothesis 9, this quantity 


é satisfying approximately the relation 


z e FI [[nY. ze 1 
eb i i 


where c is the prior probability of 0), now makes the value of c much more arbitrary. In fact, in situations 
where one might be tempted to put I infinity the silly answer ¢ = 1 ensues. D. V. Lindley has suggested 
to me, in correspondence, that one way out of this dilemma would be to make c/(1— c) the prior gada in 
favour of the null hypothesis against any unit interval of the alternative values, but ns is rather an 
artificial evasion of the difficulty. Tt is common for those who use the Bayes's approach to assume 

] population. If the difference in means between two 


a uniform prior distribution for the mean of a normal po f d 3 9e 
piscis iens (for simplicity, of known equal variances) is considered, the question might legitimately be 
asked whether these populations, fromeach of which a sample is available, are identical. The most natural 


rio: babilities would seem to me, if we try to use the Bayes's approach, to be c for this null hypothesis, 
om Sent uniform prior distribution of the true difference in population means over the entire 


infinite range. i 

The other point that might be 
to leave J finite, is this. Certainly, 
increase with Jn. But from the Neyman 
being tested against a single alternative d r 
the ‘distance’ d = 0; — s between the two hypotheses (4/ 


discussion is a little more complicated but, with a ra 


Iread with considerable interest the discussion by 


noticed about formula (1), if we disregard the above difficulty and agree 
for a fixed significance level (and J and c fixed), the posterior odds 
-Pearson theory of the power of tests, if a null hypothesis 4, is 
the sample size n would if possible be chosen in relation to 
n inversely proportional to d). The situation under 
nge of I for the alternatives, it would be fairly 
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hi lv. zi i awto = Ao/I 
onable to choose the sample size n analogous making An proportional to 1/I. If we write Am JI, 
reas a y PIENE 


we obtain S 5 " en] (2) 
1-6  1—e \V(27) ' 


where A = An(z — 0,)/a ; in (2) there is a constant relation between g and À for fixed c. 


REFERENCES 


JEFFREYS, H. (1948). Theory of Probability, 2nd ed. Oxford: Clarendon Press. 
LixDLEY, D. V. (1957). A statistical paradox. Biometrika, 44, 187-92. 


Editorial Note. Mr Lindley agrees and apologizes for the fact that a factor 1/I was omitted a 
equation (1), but points out that in the two examples which he discusses this factor is unity. His d^ also 
argument as to the limiting value of 2 is in any case unaffected, and his two particular examples Ei Si 
unaffected. There appears to be no real difference of opinion between Prof. Bartlett and Mr Lindley 
this point. 

The point raised by Prof. Bartlett’s second 
form prior probability for a parameter of 
properly cleared up; if the probability of x 
clusion that £ is zero (the integral having to 
limiting processes are involved and no clea: 
In any case, this point mainly concerns e 
hypotheses. 

In regard to Prof. Bartlett's final point, it may 
Suggests is implicit in the idea of the asymptotic r 
against a specific alternative tends to unity 
tically at a fixed significance level, it is neces: 
as the sample size increases. 


paragraph is related to the difficulty of laying down est 
infinite range, a point which, in my opinion, has not be i 
is kdy, integration over the infinite range leads to the per 
be unity). The root of this difficulty seems to be that sever! 

r rules have been laid down as to which, if any, has auc d 
stimation, whereas Mr Lindley was concerned with testing 


be useful to observe that some procedure of the type e 
elative efficiency ofa test. In general, the power of a g 
with increasing sample size. To compare two tests asymp ‘5 
sary to allow the alternative to approach the null ae 
M.G.K. 


* There is also a further dro: 


se e 
Pbping of a factor 1/c in the last formula on p. 191, but this is a mor 
trivial slip. 


CORRIGENDA 
Biometrika (1957), 44, pp. 150-8 


"The use of a concomitant variable in collecting an experimental design.’ By D. R. Cox 


Dr K. R. Nair has kindl 


the above paper have be 
that in formula (6) 


y pointed out that some of the results for Methods II and V ki 


en given by him in Sankhya (1942), 6, 167-174. He has also noted 
of my paper (z;— €)? should read kx; 2. 


D.R.O- 
Biometrika (1957), 44, pp. 168-78 
‘Multiple runs.’ By D. E. BARTON and F, N. Davip 
P. 174, line 10. Delete ‘transition probabilities? and Substitute ‘joint probabilities of tW? 
successive events’, 
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REVIEWS 


Proceedings of the Third Berkeley Symposium on Mathematical Statistics and 
Probability. Edited by Jerzy NEvwaw. University of California Press, for whom 
Cambridge University Press act as agents. 1957. Vol.1. Theory of Statistics, 
pp. 208, 45s. Vol. II: Probability Theory, pp. 246, 49s. Vol. III: Astronomy and 
Physics, pp. 252, 47s. Vol. IV: Biology and Problems of Health, pp. 179, 43s. 6d. 
Vol. v: Econometrics, Industrial Research and Psychometry, pp. 184, 43s. 6d. 


Journal space and the limited versatility of the reviewer would preclude anything like a critical 
appraisal of these books in which are printed all the papers presented at the third symposium held in 
1954/55 at the Statistical Laboratory of the University of California at Berkeley. I propose therefore 
to write a ‘contents’ review in order to show that there is something for everyone in the reported 
proceedings. 

Volume I. There are papers by: J. Berkson, ‘Estimation by Least Squares and by Maximum 
Likelihood’; Z. W. Birnbaum, ‘On a Use of the Mann-Whitney Statistic’; H. Chernoff and H. Rubin, 
‘The Estimation of the Location of a Discontinuity in Density’; A. Dvoretzky, ‘On Stochastic 
Approximation’; S. Ehrenfeld, ‘Complete Class Theorems in Experimental Design’; G. Elfving, 
‘Selection of Nonrepeatable Observations for Estimation’; U. Grenader and M. Rosenblatt, ‘Some 
Problems in Estimating the Spectrum of a Time Series’; J. L. Hodges & E. L. Lehmann, "Two Approxi- 
mations to the Robbins-Monro Process'; W. Hoeffding, "The Role of Assumptions in Statistical 
Decisions’; S. Karlin, ‘Decision Theory for Pólya Type Distributions. Case of Two Actions " L. le Cam, 
‘On the Asymptotic Theory of Estimation and Testing Hypotheses’; H. Robbins, ‘An Empirical Bayes 
Approach to Statistics’; M. Rosenblatt, ‘Some Regression Problems in Time Series Analysis? ; C. Stein, 
‘Efficient Nonparametric Testing and Estimation’ and *Inadmissibility of the Usual Estimator for the 
Mean of a Multivariate Normal Distribution’; B. L. van der Waerden, ‘The Computation of the 
X-distribution’. 


The majority of the papers are on technical points and in the mathematical style familiar to readers 


of the Annals of Mathematical Statistics. The most interesting paper to the reviewer is that by eee 
who with his commonsense approach and tenacity of purposes refuses to be blinded by mathematics 
and who in this paper raises once again the challenge in the problem. hs j " 
Volume II. There are papers by: D. Blackwell, *On a Class of Erobatility padra ; 5S. pose 
*Stati i "iodicity of Random-Valued Functions ; &. u. hung, Founda- 
Stationarity, Boundedness, Almost Periodicity t Copeland, "Postátitiiab, Obusr- 


ti i ter Markov Chains’; 
ions of the Theory of Continuous Parameter ona eo e ees gate Oor- 


vati ictions’; J. L. Doob, ‘Probability pher ; Man 
eei RM. Fortet, ‘Random Distributions with. an Application to aci sa wire ime ; 
J. M. Hammersley, ‘The Zeros of a Random POM : <= Ne edad devon of E 
Markov Processes’; K. Itô, ‘Isotropic Random Current ; ©. -< bg gies : E: ae n 
Motion, and 4 General Theory of Gaussian Random Functions i M. Loéve, dea aoe doni [b j 
E. Lukacs, ‘Characterization of Populations by Properties of Suitable piers i Seas ag ices om 
Variables from the Point of View of a General Theory of Variables ; E. Se uiis sin 
Banach Spaces’; R. Salem and A. Zygmund, ‘Random pu ep us i am fit: Gf othar-probabilist 
The papers in this volume appear to be written by probabilists ca the a cm Sees a S 
and most of them reflect the current interest of probabilists m in. um Ed Em lied Pratis 
immediate statistical link-up—there is no reason why there shou ei pp. a 


£enerall ill find this volume unrewarding. The theoretical statistician seeking to apply the random 
y Ww: 


i ful. 3 

u > otn RE x em = ‘divided into two sections, one for astronomy and one for physics. The 
m lee III. tronomy consists of (i) Hertzsprung-Russell Diagram, with papers by: O. J. Eggen, 
"m ri utions to as Jui the Color and the Luminosity of Stars near the Sun'; J . L. Greenstein, 
he Relationship between Normal Main Sequence’; H. L. Johnson, 


ies of Stars Lying Below the Iain i 
ae a oh eee Mugmnitaclen and Colors’; G. E. Kron, Evidence for Sequences in the 
ectric 


Ye 6 ,' The Hertzsprung-Russell Diagram *; and of (ii) Spatial 
Color-Luminosity for M-Dwarfs peii ard C. MeVittie, ‘Galaxies, Statistics and Relativity’; 


renee E : z by: 1 s de V 
b ood of eae So hash Me cing of Images of Galaxies’; F. Zwicky, paces of 
tae terii The contributions to physics are from: A. Blanc-Lapierre and A. Tortrat, 
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TP 3. 8 roll 
‘ sati 7 i Probability Theory’; M. Kac, ‘Foundations of Kinetic Theory’; E. Montro , 
id un MERE of Simple Cubic Lattices’; and N. Weiner, ‘Nonlinear Prediction and 
cUm of all types will find this volume of interest and need not be deterred by — a£ 
knowledge of astronomy. The papers are well and clearly written, and it appears v rue ae 
get some idea of work being done by statisticians in fields other than those we think of as 
i i onomies, ete. m 
aa in There are aie by: J. Crow and M. Kimura, ‘Some Genetic — ee 
Populations’; E. R. Dempster, ‘Some Genetic Problems in Controlled Populations 3 ie cota 
T. Park and E. L. Scott, ‘Struggle for Existence. The Tribolium Model’; M. 8. Bartlett, Dete 5 eae 
and Stochastic Models for Recurrent. Epidemics’; A, T. Bharucha-Reid, * On the Stochastic T he Ais 
Epidemics’; C. L. Chiang, J. L. Hodges and J. Yerushalmy, ‘Statistical Studies in Medical Diagn a io 
J. Cornfield, ‘A Statistical Problem Arising from Retrospective Studies’; D. G. Kendall, : Deter mu 
and Stochastic Epidemies in Closed Populations'; W. F. Taylor, *Problems in Contagion’. mus 
The level of these papers in this volume is high. They are written in a mathematical page joes 
all statisticians should be able to understand, and since there is a concrete problem on whic S 
mathematics is hung there is a sense of purpose about them which is often lacking in papers suc ain 
those, for example, in Volume I. The fields of application are the traditional biometric ones an 
consequence this volume will be needed by many workers in the statistical field. "and 
Volume V. Under the subtitle ‘Contributions to Econometrics’ there are papers by: K. J. Arrow An 
L. Hurwiez, ‘Reduction of Constrained Maxima to Saddle-Point Problems’; E. W. Barankin, of 
Objectivistic Theory of Probability’; C. W. Churchman, ‘Problems of Value Measurement for a pens A 
of Induction and Decisions’; P. Suppes, ‘The Role of Subjective Probability and Utility in conn 
Making’. Under the subtitle ‘Contributions to Industrial Research’ there are papers by: A. H. amma 
Continuous Sampling Plans'; C. Daniel, ‘Fractional Replication in Industrial Research’; M. pe: "e 
"Sequential Procedures for Selecting the Best Exponential Population’, Under the subtitle ‘Contribu 


n 
tions to Psychometry’ there are papers by: T. W. Anderson and H. Rubin, ‘Statistical Inference ! 
Factor Analysis’; F. Mosteller, 


is and 
‘Stochastic Learning Models’; and H. Solomon, ‘Item Analysis 2? 
Classification Techniques’. 


y. N. DAVID 


Fejezetek a klasszikus valószíüüsé$számítàsból. (Chapters on the classical probability 
calculus.) By JORDAN KÁROLY (Charles Jord: 


an). Budapest: Akadémiai Kiado- 
1956. Pp. 616. 120 forint. 


in their language, alt 
oth 


: that 
hough unfortunately this means 
ed of the use of the 


ITH 
book. CEDRIC A. B. AND PIRI SM 


Mathematische Statistik (Band LXXXVII of the Grundlehren der Mathematische 
Wissenschaften series). By B. L. VAN DER Warr 


51. 
DEN. Berlin: Springer Verlag. m 
Pp. 360. D.M. 49.60 (£4. 95.) 


Tt is interesting to compare this 
1956, and published b 


à i more informal and less systematic P 
y going less deeply into the general theory. Thus, although he is care lie 
our, the author treats the commoner, simpler and more frequently apF 
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tests and estimations problems. He opens with a simple presentation of Kolmogoroff's axioms and 
follows with an examination of frequency distributions and the information contained in their means and 
variances, He then treats Kolmogoroft’s K and allied measures and develops the mathematical theory 
of transformation of variates and characteristic functions. Next comes the theory of distributions 
derived from the normal followed by two short chapters on estimation by least squares and maximum 
likelihood and minimum y?. A chapter on bioassay intervenes before consideration of simpler tests and 
the bare outlines of the Neyman—Pearson theory. The last two chapters are concerned with Wilcoxon’s 
and allied tests and with correlation, both product moment and by rank methods. 
The book is beautifully produced, as indeed it should be at its formidable price. D. E. BARTON 


Statistical Methods in Research and Production with special reference to the 
Chemical Industry (third edition revised). Edited by O. L. Daves. Edinburgh: 
Oliver and Boyd Ltd. 1957. Pp. 396. 45s. 


This book is the third edition of one first published in 1947, and runs to 390 pages against 286, while of 
eight authors listed five names are common to the seven listed in the first edition. (The reviewer has 
not seen the second edition). The potential reader should not be put off by the expression ‘with special 
reference to the chemical industry’ if that does not cover his own field, as all this means is that the 
examples are mostly chosen from chemical subjects. 

The book consists mainly of a practical exposition of the logic and application of the more elementary 
statistical methods. Personally I would consider that the treatment is weighted somewhat too much 
on the application side, since in my experience amateur statisticians rarely fail to apply methods cor- 
rectly, but are often weak in the logie of what they are doing. In any statistical method there is quite 
a narrow range of readers between those who know it already and those who are unwilling to make the 
effort to learn it, and everyone tends to have different ideas on what the typical reader should be assumed 
to know, but I feel that actual readers will mostly be less logical and more arithmetical than is apparently 
assumed by the authors. 2 

The methods of exposition and the notation are admirably clear. Trying out on myself a section 
describing a method which I have not used I had no difficulty at all in understanding and applying it. 

The only part of the book I disagree with to any serious extent is the statement on page 1 that 
‘statistics may be defined as the study of chance variations’. It is surely truer to say that statistics is 
the study of how to draw valid conclusions from numerical data in spite of interference from chance 
variations. 

Tho scatter diagram on page 191, said to represent a low correlation, in fact corresponds to one of 
approximately 0:584. . . 

'The book may be strongly recommended to anyone, even with very little mathematical knowledge, 


wishing to apply statistical methods to almost any form of industrial research or production. 
L. MOMULLEN 
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This book is extremely comprehensive although 
only all the common methods of quality control, 
required going somewhat further in this direction than wou 


i i i ality control. ; ; i 
— erase: mri d h statistical theory starting with measures of location, dispersion 


chapters deal wit " à E : : 

B erts Sane with the elements of probability theory, exemplified by the binomial theorem, 

and le ins - to t| wed by methods of estimation, tests of hypotheses and 
eading up ü 


he normal curve. This is follow i s 
ith si i d tests for the homogeneity of variances. 
imi her with simple analyses of variance and tes c 
hon arde scel chapters deal with quality control, interspersed with some chapters on 
remainin 
Portions of statistical theory, 


such as frequency curves, the hypergeometric distribution and regression 
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; i based on the combination of small samples, such as the deferre 
pi pem (—— ai eats es both warning and outer limits in combination, suggested VY 
a poe on peri W. J. Jennett, and further investigated by E. S. Page, are not discussed. Tos 
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testing extreme values, the circular autocorrelation coefficient with lag 1 and certain percentag 
m-Charlier curve. , 
x bci would probably make a difficult course in statistics for a student to work through pes 
own because it is very hard to see the wood from the trees. The student would also be left s »leri: 
difficulty as to which of the many techniques available he should apply in any given practical E p 
The volume would, however, be very useful as a textbook for a course on quality control whe lida- 
teacher could sign-post the way and use much of the material in the book for illustration and conso 
tion. As a refresher and reference work, too, the book should have an assured place. p. G. MOORE 
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Scientific Inference (second edition). By Sm Hanorp J EFFREYS. London: Cambridg? 
University Press. 1957. Pp. 236. 25s. 
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The first edition of this book was published in 1931, re-issued with addenda in 1937, and now B Pi 
sented largely re-written and developed. It was reviewed in this journal in 1932. To those familia" i 
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Surprises; the author takes the point of view which we have come to expect. But whether we agree or 
not with the ideas put forward on probability (direct and inverse), or sampling, or Significance, to 
mention a few among many topies, the writing is stimulating and a challenge to the reader. It is books 
such as these which are the leaven in the dough of academie textbooks, 


Introduction to Statistical Analysis (second edition). By Wurrep J. Drxon and 
J. Massey Jr. London: McGraw-Hill Publishing Co. Ltd. 1957. Pp. 481. 45s. 


This is the second edition of a textbook which was reviewed by John Wishart in 1952 (Biometrika, 
Vol. 39). There are some changes and additions. The chapters on ‘Various Measures of Central Value 
and Dispersion’, on ‘Statistical Inference’ and on ‘Analysis of Variance’ have been re-written and 
enlarged. A small amount of material has been interpolated in other chapters, and a completely new 
chapter on probability has been added at the very end. The quantity of statistical tables given in the 
first edition was impressive and these have now been increased both in number and extent. The present 
writer would agree with Wishart that ‘the book under review compares favourably with others of its 
kind’. It is, however, somewhat stereotyped and uninspiring. 
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